Overlap Analysis

The Overlap Analysis template shows how partner customer data overlaps with Instacart's user base. It helps partners see their potential reach on Instacart by matching hashed identifiers.

The template matches customer identifiers provided by partners like Hashed Email (HEM) and Hashed IP Address with Instacart's user database to generate match statistics.

You can perform the following actions with this template:

Match hashed customer identifiers to Instacart's user database.
Work with multiple identifier types (HEM, Hashed IP Address).
Get data on both total and unique matches.
View summary statistics without revealing individual match details.

note

Because of pre-aggregated data, no minimum user count is required.

Data linking

Before linking your data to the Overlap Analysis template, ensure it meets the following requirements:

An IDENTIFIER_TYPE column with values HEM or HASHED_IP_ADDRESS.
An IDENTIFIER_VALUE column with properly hashed identifiers.
Data must be loaded into your Snowflake account.
Proper column naming and data types.
No personally identifiable information (PII) can be in raw form.

note

The Overlap Analysis template is the only one that requires data linking.

Format your data according to the Overlap Analysis template requirements.
Load your data into designated tables in your Snowflake account.
Use the clean room API to register your data tables by running the following query:
```
call samooha_by_snowflake_local_db.consumer.register_db('YOUR_DATABASE_NAME');
```

Link your dataset to the clean room by running the following query:

call samooha_by_snowflake_local_db.consumer.link_datasets(
    'CLEANROOM_NAME',
    ['YOUR_DATABASE.YOUR_SCHEMA.YOUR_TABLE']
);

Verify your data is properly linked and accessible in the clean room.

Run the query

Run the following query to calculate the overlap analysis:

set cleanroom_name = 'CLEANROOM_NAME';
call samooha_by_snowflake_local_db.consumer.run_analysis(
    $cleanroom_name, -- cleanroom name
    'overlap_analysis',
    ['YOUR_DATABASE.YOUR_SCHEMA.YOUR_TABLE'], -- Consumer table with identifiers
    [], -- Provider tables (none needed)
    object_construct() -- No additional parameters
);

Consumer table schema

Parameter	Description	Data type	Allowed values	Required or optional
Consumer table	A table with customer identifiers	table	Table with `IDENTIFIER_TYPE` (`HEM`, `HASHED_IP_ADDRESS`) and `IDENTIFIER_VALUE` columns	required

note

No additional parameters are required.

Your consumer table must have the following structure:

CREATE TABLE YOUR_DATABASE.YOUR_SCHEMA.YOUR_TABLE (
    IDENTIFIER_TYPE VARCHAR,      -- 'HEM' or 'HASHED_IP_ADDRESS'
    IDENTIFIER_VALUE VARCHAR      -- The hashed identifier
);

For example:

IDENTIFIER_TYPE	IDENTIFIER_VALUE
`HEM`	a1b2c3d4e5f6…
`HEM`	f6e5d4c3b2a1…
`HASHED_IP_ADDRESS`	1a2b3c4d5e6f…

Result

Result	Description
`TOTAL_RECORDS`	Total records in the partner dataset
`TOTAL_IP_MATCH_RECORDS`	Total IP address matches
`TOTAL_HEM_MATCH_RECORDS`	Total email (HEM) matches
`TOTAL_DISTINCT_HEM_MATCH_RECORDS`	Unique email (HEM) matches
`TOTAL_DISTINCT_IP_MATCH_RECORDS`	Unique IP address matches

Example result

TOTAL_RECORDS	TOTAL_HEM_MATCH_RECORDS	TOTAL_IP_MATCH_RECORDS	TOTAL_DISTINCT_HEM_MATCH_RECORDS	TOTAL_DISTINCT_IP_MATCH_RECORDS
50000	12500	8000	8750	5250

Use cases

Audience sizing. Understand how many of your customers are also Instacart users.
Reach planning. Estimate potential reach for advertising campaigns on Instacart.
Data quality assessment. Evaluate the quality and coverage of your identifier data.
Partnership evaluation. Assess the value of data collaboration partnerships.

Best practices

Consider the following best practices when running this query:

Ensure consistent hashing algorithms across your data sources.
Clean and normalize identifiers before hashing.
Remove invalid or test data from your identifier lists.
Use HEM for the highest match accuracy.
Include IP addresses for broader reach estimation.
Consider using multiple identifier types for comprehensive analysis.
Account for potential duplicate identifiers in your source data.
Consider the recency of your identifier data.
Understand that match rates may vary based on identifier quality.
Ensure all identifiers are properly hashed before upload.
Follow your organization's data governance policies.
Document identifier hashing methods for audit purposes.

Methodology

This template employs various methodologies to gather, analyze, and interpret data.

Matching process

The template runs a process to match data while improving privacy.

Separate identifiers. Splits Hashed Emails (HEMs) and IP addresses for individual processing.
Hash matching. Links partner data with Instacart data through exact hash matches.
Match counts. Calculates total and unique matches for each type of identifier.
Summarized results. Provides summary stats without revealing individual match details.

Analyze Overlap Analysis results

When interpreting results, consider the following factors:

Match quality. The accuracy of the results depends on how current and reliable the identifiers are.
Hash consistency. Both datasets must use the same hashing method for successful matching.
Identifier coverage. Different types of identifiers (like HEM and IP) cover different parts of the audience.
Temporal factors. Identifiers may lose relevance over time.

Data checkpoints

The methodology includes built-in quality checks to support accurate results.

Input validation. Confirms the table structure and identifier formats are correct.
Type validation. Ensures identifiers (such as, Hashed Email or IP) match allowed types.
Hash verification. Checks that the identifiers are properly hashed before processing.

Technical implementation

The algorithm is designed to balance efficiency, security, and accuracy.

Scalability. Handles large datasets efficiently with optimized joining processes.
Security. Runs entirely in Snowflake's secure clean room environment.
Accuracy. Uses exact matching to avoid false positives.
Performance. Includes database indexing and partitioning to speed up execution.

Analysis capabilities

The template provides detailed insights into audience overlap.

Total records. Total count of records in the partner's dataset.
HEM matching. Total and unique matches based on Hashed Email.
IP matching. Total and unique matches based on Hashed IP Address.
Comprehensive coverage. Combines multiple identifier types for audience insights.

Limitations and considerations

Identifier quality. The accuracy of results depends on how recent and reliable the hashed identifiers are.
Hashing consistency. Hashing methods must be the same across both datasets for accurate matches.
Coverage patterns. Different types of identifiers may have unique match rates and coverage areas.
Temporal decay. The relevance of older identifiers may decrease over time.

Data linking​

Run the query​

Consumer table schema​

Result​

Example result​

Use cases​

Best practices​

Methodology​

Matching process​

Analyze Overlap Analysis results​

Data checkpoints​

Technical implementation​

Analysis capabilities​

Limitations and considerations​

Data linking

Run the query

Consumer table schema

Result

Example result

Use cases

Best practices

Methodology

Matching process

Analyze Overlap Analysis results

Data checkpoints

Technical implementation

Analysis capabilities

Limitations and considerations