Overlap Analysis
The Overlap Analysis template shows how partner customer data overlaps with Instacart's user base. It helps partners see their potential reach on Instacart by matching hashed identifiers.
The template matches customer identifiers provided by partners like Hashed Email (HEM) and Hashed IP Address with Instacart's user database to generate match statistics.
You can perform the following actions with this template:
- Match hashed customer identifiers to Instacart's user database.
- Work with multiple identifier types (HEM, Hashed IP Address).
- Get data on both total and unique matches.
- View summary statistics without revealing individual match details.
Because of pre-aggregated data, no minimum user count is required.
Data linking
Before linking your data to the Overlap Analysis template, ensure it meets the following requirements:
- An
IDENTIFIER_TYPEcolumn with valuesHEMorHASHED_IP_ADDRESS. - An
IDENTIFIER_VALUEcolumn with properly hashed identifiers. - Data must be loaded into your Snowflake account.
- Proper column naming and data types.
- No personally identifiable information (PII) can be in raw form.
The Overlap Analysis template is the only one that requires data linking.
-
Format your data according to the Overlap Analysis template requirements.
-
Load your data into designated tables in your Snowflake account.
-
Use the clean room API to register your data tables by running the following query:
call samooha_by_snowflake_local_db.consumer.register_db('YOUR_DATABASE_NAME'); -
Link your dataset to the clean room by running the following query:
call samooha_by_snowflake_local_db.consumer.link_datasets(
'CLEANROOM_NAME',
['YOUR_DATABASE.YOUR_SCHEMA.YOUR_TABLE']
); -
Verify your data is properly linked and accessible in the clean room.
Run the query
Run the following query to calculate the overlap analysis:
set cleanroom_name = 'CLEANROOM_NAME';
call samooha_by_snowflake_local_db.consumer.run_analysis(
$cleanroom_name, -- cleanroom name
'overlap_analysis',
['YOUR_DATABASE.YOUR_SCHEMA.YOUR_TABLE'], -- Consumer table with identifiers
[], -- Provider tables (none needed)
object_construct() -- No additional parameters
);
Consumer table schema
| Parameter | Description | Data type | Allowed values | Required or optional |
|---|---|---|---|---|
| Consumer table | A table with customer identifiers | table | Table with IDENTIFIER_TYPE (HEM, HASHED_IP_ADDRESS) and IDENTIFIER_VALUE columns | required |
No additional parameters are required.
Your consumer table must have the following structure:
CREATE TABLE YOUR_DATABASE.YOUR_SCHEMA.YOUR_TABLE (
IDENTIFIER_TYPE VARCHAR, -- 'HEM' or 'HASHED_IP_ADDRESS'
IDENTIFIER_VALUE VARCHAR -- The hashed identifier
);
For example:
| IDENTIFIER_TYPE | IDENTIFIER_VALUE |
|---|---|
HEM | a1b2c3d4e5f6… |
HEM | f6e5d4c3b2a1… |
HASHED_IP_ADDRESS | 1a2b3c4d5e6f… |
Result
| Result | Description |
|---|---|
TOTAL_RECORDS | Total records in the partner dataset |
TOTAL_IP_MATCH_RECORDS | Total IP address matches |
TOTAL_HEM_MATCH_RECORDS | Total email (HEM) matches |
TOTAL_DISTINCT_HEM_MATCH_RECORDS | Unique email (HEM) matches |
TOTAL_DISTINCT_IP_MATCH_RECORDS | Unique IP address matches |
Example result
| TOTAL_RECORDS | TOTAL_HEM_MATCH_RECORDS | TOTAL_IP_MATCH_RECORDS | TOTAL_DISTINCT_HEM_MATCH_RECORDS | TOTAL_DISTINCT_IP_MATCH_RECORDS |
|---|---|---|---|---|
| 50000 | 12500 | 8000 | 8750 | 5250 |
Use cases
- Audience sizing. Understand how many of your customers are also Instacart users.
- Reach planning. Estimate potential reach for advertising campaigns on Instacart.
- Data quality assessment. Evaluate the quality and coverage of your identifier data.
- Partnership evaluation. Assess the value of data collaboration partnerships.
Best practices
Consider the following best practices when running this query:
- Ensure consistent hashing algorithms across your data sources.
- Clean and normalize identifiers before hashing.
- Remove invalid or test data from your identifier lists.
- Use
HEMfor the highest match accuracy. - Include IP addresses for broader reach estimation.
- Consider using multiple identifier types for comprehensive analysis.
- Account for potential duplicate identifiers in your source data.
- Consider the recency of your identifier data.
- Understand that match rates may vary based on identifier quality.
- Ensure all identifiers are properly hashed before upload.
- Follow your organization's data governance policies.
- Document identifier hashing methods for audit purposes.
Methodology
This template employs various methodologies to gather, analyze, and interpret data.
Matching process
The template runs a process to match data while improving privacy.
- Separate identifiers. Splits Hashed Emails (HEMs) and IP addresses for individual processing.
- Hash matching. Links partner data with Instacart data through exact hash matches.
- Match counts. Calculates total and unique matches for each type of identifier.
- Summarized results. Provides summary stats without revealing individual match details.
Analyze Overlap Analysis results
When interpreting results, consider the following factors:
- Match quality. The accuracy of the results depends on how current and reliable the identifiers are.
- Hash consistency. Both datasets must use the same hashing method for successful matching.
- Identifier coverage. Different types of identifiers (like HEM and IP) cover different parts of the audience.
- Temporal factors. Identifiers may lose relevance over time.
Data checkpoints
The methodology includes built-in quality checks to support accurate results.
- Input validation. Confirms the table structure and identifier formats are correct.
- Type validation. Ensures identifiers (such as, Hashed Email or IP) match allowed types.
- Hash verification. Checks that the identifiers are properly hashed before processing.
Technical implementation
The algorithm is designed to balance efficiency, security, and accuracy.
- Scalability. Handles large datasets efficiently with optimized joining processes.
- Security. Runs entirely in Snowflake's secure clean room environment.
- Accuracy. Uses exact matching to avoid false positives.
- Performance. Includes database indexing and partitioning to speed up execution.
Analysis capabilities
The template provides detailed insights into audience overlap.
- Total records. Total count of records in the partner's dataset.
- HEM matching. Total and unique matches based on Hashed Email.
- IP matching. Total and unique matches based on Hashed IP Address.
- Comprehensive coverage. Combines multiple identifier types for audience insights.
Limitations and considerations
- Identifier quality. The accuracy of results depends on how recent and reliable the hashed identifiers are.
- Hashing consistency. Hashing methods must be the same across both datasets for accurate matches.
- Coverage patterns. Different types of identifiers may have unique match rates and coverage areas.
- Temporal decay. The relevance of older identifiers may decrease over time.