Skip to main content

Product information pipeline

As a retailer, your inventory file provides the foundation of your product catalog. However, Instacart integrates data from other sources to ensure that the product information displayed on your storefront is accurate, consistent, and aligned with regulatory and industry standards. This means that some of the information presented to customers might not match what you provided in your inventory files.

To ensure the highest quality product information, Instacart performs the following actions:

  • Collect data from retailers, content service providers (CSPs), and consumer packaged goods partners (CPGs).

  • Determine which data source provides the most relevant information for each product attribute.

  • Revise raw data to comply with standards for grammar and style.

  • Refine information using the most current data gathered from customer and shopper activity.

Reasons why inventory file data is modified or replaced

Instacart's main goal in using multiple data sources is to provide the highest quality product information to customers.

The following are examples of why retailer inventory file data is modified or replaced by data from other sources:

  • Incomplete. Product information, such as dimensions, claims, or ingredients is missing.

  • Incorrect. Information contains inaccuracies, misspellings or other errors, or doesn't align with branding guidelines.

  • Inconsistent. Acronyms, abbreviations, or formatting of values, such as lb aren't consistent across products.

  • Inaccurate. Information that was accurate when originally provided, but can be improved using historical data. For example, Instacart may replace a product's par_weight or availability information based on real-world shopper activity.

  • Non-compliant. Information doesn't align with legal or regulatory guidelines. For example, Instacart might revise a retailer-provided claim of "low fat" if it doesn't meet established nutritional thresholds. Similarly, Instacart enforces legally defined claims like "fat-free," "organic," or "gluten-free."

Data sources

The following is a summary of possible sources for the product information displayed on your storefront:

  • Retailer inventory file. Information sourced directly from the catalog inventory file typically includes store-specific details such as pricing. This information is dynamic, meaning it gets updated each time we receive a new file.

  • Content service providers (CSPs). CSPs collect and curate product information, often in coordination with brand partners, to help distribute high-quality product information to external partners (such as Instacart).

  • Consumer packaged goods partners (CPGs). Brand partners help to ensure that their products are displayed in accordance with their policies and marketing standards. Large CPGs typically work with CSPs to provide or update information. Smaller brands often work directly with Instacart.

  • USDA commodities database. Information for many produce items is sourced from USDA Commodities and Products databases. This ensures standardized, consistent naming, imagery, and size variants for items like bananas or blueberries, regardless of retailer.

  • Internal Instacart data. Information derived from the following internal sources helps to improve accuracy and alignment across the platform:

    • Universal Catalog. A central database enforcing standardized information for commonly available products.

    • Product information style guide. A process for revising or correcting terminology, spelling, punctuation, abbreviations, and related issues.

    • Shopper feedback. Data captured through shopper input is compared to retailer-provided data to help correct inaccuracies.

    • Transaction and order history. Historical data can help identify issues or improve the accuracy of product information.

Data scope

Changes to product information can affect one, some, or all storefronts, depending on the attribute being changed.

The following are examples of how data attributes at different levels can impact retailer storefronts:

  • Item-level data. Some attributes are defined per individual store. For example, two stores under the same banner can have different pricing for the same product. Retailers can update each store's pricing independently using their inventory file.

  • Retailer-level data. Some attributes are defined for all stores under the same retail banner. For example, all stores that sell a particular product refer to the same URL to retrieve the product image. Changes to the URL affect all storefronts under the banner.

  • Product-level data. Some attributes are defined globally. For example, a list of ingredients for a national brand product is displayed the same across all storefronts where the product is sold, regardless of retailer. Changes to product-level data can have broad effects across many storefronts.

Retailer control by product type

Retailers have varying levels of influence over the display of product information based on the product type.

The following examples describe the retailer's control over different products:

  • In-house products. Retailers have a high level of control over information for their own products. For example, a retailer can provide nutritional information for bread baked in their store.

  • Commodities. Retailers have some control over information for items such as produce. For example, Instacart might source information for whole watermelons from the USDA database, but retailers can provide information about their pre-cut watermelon packaged in-house. In these cases, retailers might have to work with Instacart to create a unique entry for their product in the catalog.

  • National brand products. Retailers have little control over product information for established brands like Coca-Cola. Since the product name, description, ingredients, and other key information is sourced from CSPs in coordination with CPGs, retailers might only control item-level data, such as price and availability.

Attribute data prioritization

Although retailers can specify some attribute values in their inventory file, whether those values appear on the storefront or can be updated later varies by attribute. For example, some retailer-specified values become static after a new product is created—once the value is set, it can't be updated using the inventory file. Other retailer-specified values are overridden by higher-quality data from another source or by Instacart machine-learning algorithms. In some cases, the retailer-specified value is used initially and later replaced when a higher-quality data source is found or when Instacart has enough historical data to calculate a more accurate value.

The exact methodologies continue to evolve, but generally most attribute values are derived using one of the following:

Retailer file data

For some attributes, we use the values sent in the retailer's inventory file. This data can't usually be sourced from a CSP. The data is considered dynamic, which means that it is updated when we receive an inventory file with new data.

Initial retailer data

For some attributes, we use the values sent in the retailer's inventory file only upon the initial upload. This data can't be sourced from a CSP, such as the name of an in-house product. This value is considered static, which means that retailers can't update it using their inventory file. This value may be updated only if Instacart identifies a higher-priority data source, at which point the retailer-specified value is stored for reference, or manually through coordination between the retailer and Instacart.

CSP-prioritized

For some attributes, we use values from our Universal Catalog, which contains curated product information for most well-known national brands. This data is inherited from trusted sources, such as CSPs and CPGs, and overrides the values provided in the retailer's inventory file. These values are considered static and are updated only when Instacart receives new information from CSP or CPG sources.

Machine learning

For some attributes, we derive values using proprietary algorithms along with data gathered from shopper activity, transactions, related products, and other sources. By incorporating real-world information, we can calculate values that are more accurate than retailer-provided data. For these attributes, we might use the retailer-specified value until we have enough data to calculate a new value.