Product information pipeline
As a retailer, your inventory file provides the foundation of your product catalog. However, Instacart integrates data from other sources to ensure that the product information displayed on your storefront is accurate, consistent, and aligned with regulatory and industry standards. This means that some of the information presented to customers might not match what you provided in your inventory files.
To ensure the highest quality product information, Instacart performs the following actions:
-
Collect data from retailers, content service providers (CSPs), and consumer packaged goods partners (CPGs).
-
Determine which data source provides the most relevant information for each product attribute.
-
Revise raw data to comply with standards for grammar and style.
-
Refine information using the most current data gathered from customer and shopper activity.
Reasons why inventory file data is modified or replaced
Instacart's main goal in using multiple data sources is to provide the highest quality product information to customers.
The following are examples of why retailer inventory file data is modified or replaced by data from other sources:
-
Incomplete. Product information, such as dimensions, claims, or ingredients is missing.
-
Incorrect. Information contains inaccuracies, misspellings or other errors, or doesn't align with branding guidelines.
-
Inconsistent. Acronyms, abbreviations, or formatting of values, such as
lbaren't consistent across products. -
Inaccurate. Information that was accurate when originally provided, but can be improved using historical data. For example, Instacart may replace a product's
par_weightoravailabilityinformation based on real-world shopper activity. -
Non-compliant. Information doesn't align with legal or regulatory guidelines. For example, Instacart might revise a retailer-provided claim of "low fat" if it doesn't meet established nutritional thresholds. Similarly, Instacart enforces legally defined claims like "fat-free," "organic," or "gluten-free."
Data sources
The following is a summary of possible sources for the product information displayed on your storefront:
-
Retailer inventory file. Information sourced directly from the catalog inventory file typically includes store-specific details such as pricing. This information is dynamic, meaning it gets updated each time we receive a new file.
-
Content service providers (CSPs). CSPs collect and curate product information, often in coordination with brand partners, to help distribute high-quality product information to external partners (such as Instacart).
-
Consumer packaged goods partners (CPGs). Brand partners help to ensure that their products are displayed in accordance with their policies and marketing standards. Large CPGs typically work with CSPs to provide or update information. Smaller brands often work directly with Instacart.
-
USDA commodities database. Information for many produce items is sourced from USDA Commodities and Products databases. This ensures standardized, consistent naming, imagery, and size variants for items like bananas or blueberries, regardless of retailer.
-
Internal Instacart data. Information derived from the following internal sources helps to improve accuracy and alignment across the platform:
-
Universal Catalog. A central database enforcing standardized information for commonly available products.
-
Product information style guide. A process for revising or correcting terminology, spelling, punctuation, abbreviations, and related issues.
-
Shopper feedback. Data captured through shopper input is compared to retailer-provided data to help correct inaccuracies.
-
Transaction and order history. Historical data can help identify issues or improve the accuracy of product information.
-
Data scope
Changes to product information can affect one, some, or all storefronts, depending on the attribute being changed.
The following are examples of how data attributes at different levels can impact retailer storefronts:
-
Item-level data. Some attributes are defined per individual store. For example, two stores under the same banner can have different pricing for the same product. Retailers can update each store's pricing independently using their inventory file.
-
Retailer-level data. Some attributes are defined for all stores under the same retail banner. For example, all stores that sell a particular product refer to the same URL to retrieve the product image. Changes to the URL affect all storefronts under the banner.
-
Product-level data. Some attributes are defined globally. For example, a list of ingredients for a national brand product is displayed the same across all storefronts where the product is sold, regardless of retailer. Changes to product-level data can have broad effects across many storefronts.
Retailer control by product type
Retailers have varying levels of influence over the display of product information based on the product type.
The following examples describe the retailer's control over different products:
-
In-house products. Retailers have a high level of control over information for their own products. For example, a retailer can provide nutritional information for bread baked in their store.
-
Commodities. Retailers have some control over information for items such as produce. For example, Instacart might source information for whole watermelons from the USDA database, but retailers can provide information about their pre-cut watermelon packaged in-house. In these cases, retailers might have to work with Instacart to create a unique entry for their product in the catalog.
-
National brand products. Retailers have little control over product information for established brands like Coca-Cola. Since the product name, description, ingredients, and other key information is sourced from CSPs in coordination with CPGs, retailers might only control item-level data, such as price and availability.
Data prioritization by attribute
The data for each product attribute is derived from one or more sources based on Instacart's internal algorithms. Although, the methodology continues to evolve, the data source for an attribute generally falls into one of the following categories:
Retailer file data
For some attributes, we use the data sent in the retailer's inventory file. This data is considered dynamic, which means that it gets updated when we receive an inventory file with new data.
Initial retailer data
For some attributes, we use the data sent in the retailer's inventory file only if the data can't be sourced from a CSP. This might include product information for local brands or in-house offerings. If a better source is found, retailer data may be overridden and the retailer-uploaded values are stored for reference.
CSP-prioritized
For most well-known products from national brands, we use data from our Universal Catalog, which contains curated product information inherited from trusted sources, such as CSPs and CPGs.
Machine learning
Instacart derives some attribute values using historical shopper and transaction data. By incorporating current,real-world information, we can calculate values or identify availability more accurately than retailer-provided data.