Finding product alternatives at Picnic

Tettje Halbertsma
Picnic Engineering
Published in
6 min readAug 27, 2019

--

Not many people like making grocery lists. But what people like even less, is finally coming up with a recipe plan for their dinner party, filling up their basket with all the products, only to find out that the 17th and most crucial ingredient just went out of stock.

Availability is a critical factor for a happy shopping experience. Picnic’s just-in-time supply chain is designed to ensure product availability at all times, but because these are fast-moving consumer goods, there will always be exceptions. Harvests can go bad, production sites can grind to a halt, and containers can get lost.

What customers see when we run out of a specific type of bananas

To make sure that these exceptions hurt our customers as little as possible, a window pops up with two or more alternatives to the product that is currently out of stock. This way, we make sure that the customer can find a suitable substitute for the product they are looking to buy.

But how do we track, store and update these product alternatives? With a constantly changing assortment of almost 7000 products and an aversion to repetitive work, we went looking for an automated solution. Combining the power of Python and Salesforce turned out to be the answer.

Store and maintain: Product relationships in Salesforce

Picnic uses Salesforce as its product information management (PIM) system. All assortment information and settings are stored in a wide variety of objects in Salesforce. This made Salesforce the obvious choice to maintain product alternatives.

In the case of alternatives, we needed to create a relationship between two products in our assortment. Picnic products are stored on the Product2object under its own Picnic record type. In the app, we can show two alternatives, so we need to store multiple alternatives per Picnic product.

Furthermore, the relationship is directional, as there are cases where one product is an alternative for the other, but not vice versa. For example, while regular bread is a no-go for our gluten-intolerant customers, somebody looking for normal bread might be happy with a gluten-free alternative.

Hence, the product alternative is a directional, many-to-many relationship. In Salesforce, many-to-many relationships can only be constructed by using a junction object. We use a custom object in this case, called the “Product Relationship”. This object contains two lookups to the Product2 object, which are filtered to only lookup Product2 records of the type Picnic Product. One lookup for the product, one lookup for the alternative. Through these lookups, we pull some basic information about the products, so users can easily judge if the second product is indeed an alternative for the first.

All alternatives stored in Salesforce are loaded into the Picnic Retail Platform (PRP) through an ETL process. From this point on, they will appear in the app whenever needed.

Finding alternatives: from manual work to an automated process

As a young supermarket, our assortment is changing and expanding every day. This means that we constantly need to add and update product alternatives. To reduce the manual work, we run an external Python script that proposes new product alternatives in our assortment by comparing different types of information.

Comparing products is not straightforward. We can compare the product description of each product, but we need to take into account several factors when comparing these text strings.

The first challenge lies in a very specific characteristic of the Dutch language: the spelling of adjectives changes depending on the position of the adjective relative to the noun it describes. Let’s move away from the bananas for a second, and take the following products as an example:

G’woon chocolade extra puur 72% cacao

vs.

Bio+ pure chocolade 72% cacao

Our two types of chocolate as seen in the app

Both are dark chocolate, but the word order is different and with that, the word “puur” is written differently. Our algorithm should therefore not only ignore differences in word order but also accept slight changes in spelling.

Secondly, there are qualitative considerations. If we would let humans do the comparison job manually, they would unconsciously decide whether a product is a proper substitute for what a customer was originally looking for. For example, the branding or characteristics of a specific product: a person looking for a luxury brand might be fine with a similarly branded product, but would most likely not be interested in a budget brand. Humans understand this concept implicitly, but computers don’t.

In order to tackle the spelling problem and the qualitative issues, we decided to cut the problem in pieces: we split the product names in brand and core product description and applied a fuzzy text comparison to the core description and some deterministic rules to the brand.

Finding alternatives: Fuzzy text matching in Python

To collect the necessary data, we start in Python by performing a Salesforce Object Query Language (SOQL) query on the Product2 object and our custom Product Relationships object using the simple_salesforce library. Looping through our assortment, the algorithm then applies the following steps for each set of products:

  1. Split text strings and check characteristics
  2. Compare and score text strings core product description
  3. Compare characteristics and brand tiers

Let us go through one example to clarify. Let’s compare the two types of chocolate above:

Split text strings and check for characteristics

+---------------------+-------------------------+-----------------+
| | Picnic Product | Potential |
| | | Alternative |
+=====================+=========================+=================+
| Full product name | G’woon chocolade extra | Bio+ extra pure |
| | puur 72% cacao | chocolade 72% |
| Product description | “chocolade extra puur | “extra pure |
| | 72% cacao” | chocolade 72%” |
| Brand | G’woon | Bio+ |
| Organic | No | Yes |
| Fairtrade | Yes | Yes |
| Brand tier | Private label | Private label |
+---------------------+-------------------------+-----------------+

Compare text strings core product description
For this purpose, we use Python’s FuzzyWuzzy library. Notice that as word order is not of importance, we need to use the fuzz.token_sort_ratio function as the more basic ratio or partial_ratio would fail here:

Compare characteristics and brand tiers
The comparison between brand names and characteristics is straightforward: they either match or they don’t — we rate them with a 0 or 1. When the brands differ, however, we also compare the type of brands, using three levels: A-brands (1), private labels (2) and budget labels (3). When comparing, brands of the same level score higher than brands that differ by one level, and a difference of two levels scores lowest.

Finally, we calculate the final score by combining the results above. For each Picnic product, we then select all alternatives that reach a certain threshold and select the two highest-scoring alternatives. As the simple_salesforcepackage also supports a bulk upsert, we can load the new alternatives from Python directly to Salesforce, where our category can view and manage them.

What’s next?

With this solution, we’ve cracked the hardest parts of maintaining product alternatives. Storing the directional many-to-many relationships is done in Salesforce while the FuzzyWuzzy library helps us find matches, even though that requires natural language processing. Although we’ve already avoided a lot of manual work, an interesting next step would be to move from deterministic rules for comparison to a self-learning algorithm, where app events and customer feedback could be used to improve the product alternatives.

--

--