7 Antifragile Principles for a Successful Data Warehouse

Iliana Iankoulova
Picnic Engineering
Published in
12 min readMar 9, 2022

--

The best of two worlds — the structure and quality of a centralized data warehouse combined with the agility of antifragile practices

Data Warehousing has often been associated with rigid processes, slow adaptability to business needs, and high maintenance costs. This is the biggest misconception in the Information Management field. All of those effects come about when particular implementations are not able to adapt to organizational needs or technical advancements. It is not a fundamental flaw of the strategy, it is rather the failure to execute.

A centralized analytical data warehouse product can be successful. It is as relevant as ever. However, it needs to be balanced by a number of antifragile organizational practices to make sure that it is in sync with the business evolution. Picnic’s Lakeless Data Warehouse is proof that it works.

This is the fifth and final blog post in the series that I started to celebrate ‘5 years Picnic Data Engineering’. I like to round it off by sharing some fundamentals behind the engineering choices we make at Picnic on a daily basis. Those are grounded in my practical experience as a Data Engineer, Tech Lead, and Product Owner in data management during the past 12 years.

If you missed the other posts or would like a refresher, you can find them here:

A glossary to guide us

Before we dive in, let’s start with some definitions that will help us be more precise about how we use certain terms and reduce ambiguity.

Data Warehousing

Data Warehousing (DWH) is a process for collecting and managing historical data from varied sources to provide meaningful business insights. In the context of this blog post, we specifically mean Enterprise Data Warehouse (EDW), which is a centralized warehouse. It provides decision support services across the company with a unified approach for organizing and representing data.

The core benefits of DWH are:

  • A coherent operational data model which avoids knowledge silos, repetitive work downstream by the analysts, and unreliable sources
  • A Single Source of Truth, or as some professionals say, “One version of the facts serving many versions of the truth
  • A system of clear and extensive guidelines on how data is organized for analytical use
  • Root cause analysis across data from many source systems can be performed in a single environment
  • Business familiarity with the DWH enables context switching while having a limited risk of domain knowledge loss

A successful DWH product

Whenever we refer to a successful DWH product, we mean it scores highly on three criteria from business and technical perspectives:

  1. High adoption rate by analysts and business users for self-serve analytics
  2. High data quality and trust in the DWH
  3. Excellence in Google’s DevOps Research and Assessment (DORA) metrics
  • Deployment Frequency: How often an organization successfully releases to production
  • Lead Time for Changes: The amount of time it takes a PR commit to get into production
  • Change Failure Rate: The percentage of deployments causing a failure in production
  • Time to Restore Service: How long it takes an organization to recover from a failure in production

In the previous blog posts in this series, we have written in detail about the adoption of the DWH as a single source of truth and our high data quality. To further expand on the DORA metrics, Picnic’s Data Engineering:

  • Releases dozens of features every week, with a work-in-progress status of less than a week.
  • Monitors detailed Quality of Service (QoS) dashboards tracking job failures: observability is key.
  • Follows up on every issue to keep risk low.
  • Prefers data completeness over timeliness; in the case of production issues, the impact is usually minimal to operations.

By meeting the above criteria, we feel confident that we have a successful DWH product — a point of pride for Data Engineers, Data Scientists, and Analysts at Picnic.

Data quality dashboards monitor six criteria for DWH quality of service.

Antifragility

Antifragility is “a property of systems in which they increase in their capability to thrive as a result of stressors, shocks, volatility, mistakes, faults, or failures.” The concept was developed by Nassim Taleb in his book Antifragile: Things That Gain from Disorder. It is fundamentally different from the concepts of resiliency (the ability to recover from failure) and robustness (the ability to resist failure). In some cases, shock can be beneficial. Muscles are an example of this: the more we stress them, the stronger they get. DNA and learning to ride a bike are other instances inhibiting such properties. Software engineering teams have been inspired by antifragile ideas and adopted them in their practice.

How to make a DWH antifragile?

Many of the antifragility principles are relevant to Data Warehousing and mitigate the downsides of centralization. When things don’t go as planned, we take the opportunity to get stronger, smarter, and better going forward.

In this section, we will give a glimpse of how we apply the following antifragile principles to the DWH management:

  1. Sticking to simple rules
  2. Avoiding naive interventions that do more harm than good in the long term
  3. Built-in redundancy and layers (no single point of failure)
  4. Ensuring that everyone has a stake
  5. Experimenting and tinkering — taking lots of small risks
  6. Keeping our options open
  7. Not reinventing the wheel — looking for habits and rules that have been around for a long time
7 Antifragile principles and their application in Picnic’s DWH.

1. Sticking to simple rules

We believe in pragmatic documentation that lives together with the code. Access to the business logic is available to everyone in the organization via GitHub. We have comments in the tables, views, and fields. This makes it very easy to maintain the data catalog in the same place where the logic lives. It is also a source of inspiration for analysts to see SQL definitions and examples of SQL queries.

Picnic’s Data Engineering tech stack is very lean. The more complex a system is, the more prone it is to breakages that are difficult to fix, as specialized resources are scarce. We have always strived for low entropy DWH tech, and when we decide to let go of a tool we are merciless in migrating all the code away. For example, Redshift -> Snowflake; Pentaho -> Python; Airflow -> Argo; Kinesis -> Kafka. We are careful about adopting new tools and assess the tradeoffs with every technology choice.

The lean tech stack is a value shared across Picnic. Our Analytics Platform for events streaming is a great example. To learn more, check out the articles from Dima Kalashnikov — ‘Tech Radar on Event Streaming Platforms’ and ‘Picnic Analytics Platform: Migration from AWS Kinesis to Confluent Cloud’.

Keeping it simple is actually very difficult, but essential to staying fit and agile. This way, any team member can pick up most of the tasks and have a quick turn-around. To ensure high code quality and share knowledge at the same time, every PR is reviewed by 2 Data Engineers.

2. Avoiding naive interventions that do more harm than good in the long term

Source systems are accountable and responsible for resolving data issues. We have hundreds of sources — events and end-points — and sooner or later things go wrong: data is incomplete, missing, or corrupted. Instead of stepping in to fix the issue only in the DWH, we make the issue visible and, together with the source system, decide jointly how to fix it.

We do very limited patching in the DWH code. We take measures to have proper data migration, compared to keeping dead logic in our codebase forever. At the same time, we challenge statements such as “those classification fields are of no concern to the operational system, so we won’t store them”. It is a fairly simple decision rule: if there is value in having a piece of master data that will improve the business, it should be considered in the scope of the operational system. As Data Engineering professionals, we can’t take it on our shoulders to solve data quality issues or invent data. This only leads to failures that are later used as cautionary tales of why DWHs are bad, and data lakes are a better alternative.

As Picnic operates in multiple markets with separate deployments, we keep the DWH code consistent for all markets. There is a minimal number of country-specific tables and fields, and the data definition language (DDL) is the same everywhere, even if it is a separate database.

3. Built-in redundancy and layers; limit single points of failure

We love using and building open-source tools. Open-sourcing our Data Vault (DV) framework enables us to work together with a community that is independent of one contributor or vendor. It makes it possible to work together with the brightest engineers in the world. Read more in the blog post “Releasing diepvries, a Data Vault framework for Python,” by Matthieu Caneill. Here are some of the reasons we believe #diepvries will be the best tool to automate DV loading:

  • 𝐀𝐠𝐧𝐨𝐬𝐭𝐢𝐜 𝐨𝐟 𝐯𝐞𝐧𝐝𝐨𝐫. The tool is written in Python rather than domain-specific language or proprietary third-party software.
  • 𝐀𝐠𝐧𝐨𝐬𝐭𝐢𝐜 𝐨𝐟 𝐭𝐚𝐫𝐠𝐞𝐭 𝐝𝐚𝐭𝐚 𝐰𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞. We load the data in Snowflake, but it could be extended to support any database.
  • 𝐁𝐚𝐭𝐭𝐥𝐞-𝐭𝐞𝐬𝐭𝐞𝐝 𝐢𝐧 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧. More than 1400 daily job executions.
  • 𝐔𝐥𝐭𝐫𝐚-𝐥𝐢𝐠𝐡𝐭. At Picnic, the DV jobs run for less than a minute on average, from extraction to loading target DV tables.
  • 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐞𝐫𝐬 𝐟𝐨𝐜𝐮𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐚𝐝𝐝𝐞𝐝 𝐯𝐚𝐥𝐮𝐞 for the 𝐛𝐮𝐬𝐢𝐧𝐞𝐬𝐬. The rest is handled by metadata and automated SQL templates under the hood.
  • 𝐈𝐝𝐞𝐦𝐩𝐨𝐭𝐞𝐧𝐭 𝐚𝐧𝐝 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐛𝐥𝐞. The atomic nature of our implementation enables a limitless number of parallel jobs, as long as the target database can handle it.

Another example of redundancy is how our Data Engineering team operates in smaller units called squads, a common pattern in all Picnic Tech product teams. The Data Engineering squads are called Yeti and Phoenix, sharing the same DNA — inspiration for the visual in this blog post. Within the squad, the engineers get both support and challenges. At the same time, during projects, every Data Engineer has the autonomy to provide end-to-end delivery. The Tech Lead, Product Owner, and Squad Lead roles focus on facilitating and providing the team with a support structure. All engineers are full-stack, end-to-end data warehousing consultants. From the moment I get a project as a Data Engineer, it is my own, and how I will meet the expectation of the business is completely up to me. The autonomy and ability to substitute for each other creates redundancy which in turn enables a healthy workload.

In the blog post “Data Engineer’s Role in the Future of Groceries” we cover extensively the role DWH developers and how we cooperate with other roles. This layering of analytical capabilities in the organization helps us move fast and avoid bottlenecks.

Every task where we see an opportunity to decentralize, we gladly create new processes to make this happen. For example, Data Engineers gamify the tables and fields deprecation process — we champion it, but all DWH users helps with the cleaning of unused data sources.

Although we touch every data structure, our team is not the one that crosses the analytics finish line. We are the engineers in a Formula 1 team, our work is essential for our drivers to win the race. Therefore, as a team, we don’t do Data Science, Reporting, or Master Data Management. It’s extremely important to know the boundaries, so we can become experts at what we do. It’s also essential during recruitment to manage expectations and make the best match. If a candidate wants to focus on algorithms and machine learning, another team will be a better fit. However, if the candidate is passionate about designing and building our data library that is used by the whole company to make decisions, then Data Engineering at Picnic is the right place!

4. Ensuring that everyone has a stake

We never get tired of convincing people of the power of data modeling by showcasing what is possible in specific domains. When our colleagues understand the value, they become champions for data quality. We have an SQL on-boarding program that every analyst undergoes before getting DWH access. Also, we have an introductory bootcamp session for all newcomers so they can get familiar with the DWH. We strongly believe that people are the most important investment an organization can make. Having capable people and giving them powerful tools (with an accompanying manual) is one of the best ways to be antifragile.

Every Data Engineer rotates through being Release Master of the Week, which also means being the first line of support. That way, we all get familiar with the code and the release process, we develop skills in issue triaging, and we share the DataOps load. The Tech Lead and the Product Owner are also on this schedule and act as a second line of support. We all have a knowledge of and an interest in deploying stable code. Another rotation program is our Tech Data Support on Slack, with a handle that is reassigned daily on a round-robin principle among the data roles.

5. Experimenting and tinkering — taking a lot of small risks

Providing a Sandboxing & Temporary table environment for anyone with DWH access to bring their own data and build custom tables gives everyone the freedom to tinker in a contained schema with clear rules.

The first blog post in the series “Picnic’s Lakeless Data Warehouse” explains why sandboxing is such an effective and valuable DWH architectural pattern.

On the tech side, we often run PoCs with new technologies, as a routine part of our roadmap. Those PoCs can be initiated by anyone in the team. For example, we experimented with PACT contract testing and dbt.

6. Keeping our options open

In our work, we assume that the business model will change over time, so we anticipate change by reserving optionality. Requirements can change day by day for features in the operational systems; new data models are introduced and breaking changes are made. In the context of microservices, this happens at the speed of light. To deal with it, the DWH needs to be set up with that assumption in mind. While the Kimball model is robust and very usable for the business, it is notoriously difficult to change, especially if this is also the place where all the changes are captured. Therefore, very early on, even when there was only a single developer on the team, we started using the Data Vault model as the back-end DWH and Kimball as the front end.

The Data Vault is a living, breathing example of antifragility, where all relationships (links) are many to many, and new systems are easy to add to the existing model. It is highly adaptable to change in the operational data model and is ideal for distributed parallel pipelines feeding it. For more on how we keep our options open with distributed pipelines, check out our other blog post, Building a distributed ETL pipeline for a monolithic data warehouse by Matthieu Caneill.

Picnic maintains strict separation between the DWH presentation layer and the back-end in different schemas. At the same time, we have a strong preference for clear interfaces and tools that are vendor-agnostic.

7. Not reinventing the wheel — looking for habits and rules that have been around for a long time

Picnic’s Data Engineers use standards and frameworks that have been tested over the past 30 years of Data Warehousing. This includes processes and data modeling approaches learned in our practices as BI consultants and Data Engineers in different industries. We prefer to follow the standards established by Kimball and Linstedt, rather than invent our own.

We need more companies to share their Data Engineering playbooks, which can be generalized further into architectural, organizational, and technological blueprints with deeply practical elements. We did just that with this series on Picnic’s Lakeless Data Warehouse.

Given the rate at which new data is generated, having easily accessible mechanisms will make sure it is used correctly and securely. We live in an era with amazing technological possibilities and a long list of vendors, but they rarely come with an end-to-end manual on how to use them in the strategic service of our organizations. This kind of knowledge sharing will also give talented researchers material to come up with frameworks that are not biased by a particular vendor or commercial incentive. Everyone will benefit from that, especially Data Engineering as a tech profession, and organizations that will get more skilled people managing sensitive data.

Picnic invites other companies to be as transparent on managing analytics as we are.

As parting words, I close with a quote that excites me as a data professional and at the same time frightens me as an individual.

There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.
Eric Schmidt, Executive Chairman, Google.

Businesses and institutions have to step up in responsively dealing with so much data. By sharing knowledge and educating experts now, we give ourselves the chance of a brighter future, free from data chaos. 💚

Interested in joining one of the amazing Data teams at Picnic? Check out our open positions and apply today!
Data Engineer
Data Scientist
Java Developer — Event Streaming Platform
Software Engineer (Python) … and many more roles.

--

--

Passionate about solving data engineering puzzles. Enjoys movement. Curious about nutrition and sustainability. Learns best from doing and connecting to others.