Reinventing the QA process

Published in

Picnic Engineering

7 min readFeb 9, 2021

Adapting QA processes for the automated fulfillment

In the near future Picnic will launch its first automated fulfillment center (AFC). With its launch Picnic is expected to double productivity and quadruple capacity with respect to its manual FCs. While the physical site is being constructed the tech team is focused on developing the Picnic Warehouse Control System (WCS), the orchestrator of the automated fulfillment operation.

A couple of attributes of the project are specifically challenging to achieve satisfactory test coverage:

First, as building and machine construction happens in parallel with the software development project, and the launch date is planned quite soon after the physical build is complete. This places a hard time limit on the time allocated to developing the software, which also needs to be tested against the physical setup once it’s available. limited time to test the WCS code on the physical site.
Second, the on-site testing is difficult due to the coordination between the many parties involved with the construction project, as well as due to the costs required for a large amount of staff needed to perform testing with the physical setup.
Third, the MVP project contains many waterfall-like dependencies
Last, with a timeline of 2 years and a dev team of 20 FTE the scope of the MVP project alone is quite large.

To be able to deliver on the project timeline and scope we have to be creative in the QA process. The key challenge for the QA team is to come up with a test strategy that gives the agile development process quick feedback while not being blocked by the complex, waterfall-like project planning.

Testing pyramid

The large scope and high complexity of an automated warehouse has pushed us to come up with a testing approach that in a very structured manner determines and measures the testing activities.

To start off we defined the testing pyramid that relates best to our project:

The 3 top levels are very interesting for us and they are the most valuable. However, most of those tests can only be done when the entire system is implemented, meaning the entire MVP in many cases. Hence, these top levels will be unlocked to us only in the future, shortly before the go-live deadline.

Faster and more detailed feedback can be achieved with the 3 “lower levels” of the pyramid, so that is what we concentrated on at first. They are available to us for the majority of the development trajectory.

To make this work we had to ask ourselves two fundamental questions:

In such a complex system, you can end up with a massive amount of tests on all levels, but would you know that those tests are enough? (quantity doesn’t mean quality).
What confidence do those 3 levels of tests actually bring to the delivery of a product that must work together with the physical site?

To answer those questions we turned to a standard and well-known approach used to formalize testing results, namely the Requirements Traceability Matrix, which we used to ensure that for each and every business requirement an adequate level of testing is being achieved.

Building the Requirements Traceability Matrix

At this point the QA process started to look rather straightforward:

Our business requirements describe how specific parts of the system should perform on a functional level. We call these functional areas and it made sense to have the test plan follow that same structure.

Based on business requirements and system design, the QA team designed test plans consisting of all test scenarios for UI and component testing (which can be automated or done manually). Each test scenario can be linked to one or more of the requirements it covers.

This allows us to perform a quick gap analysis to identify, per functional area, the requirements that are not yet covered. We are also able to analyze test coverage per requirement.

Using standard tooling like Python Behave, Testrail, and Google Sheets / Google script the following process was set up:

Business requirements are written by the business stakeholders. Each requirement gets a unique ID.
Test scenarios are created by the QA Engineer. Via the requirement ID each scenario is linked to one or more requirements. Scenarios are described in Behavior-driven development (BDD) style.
Test scenarios are implemented by developers at the Component test level and executed by the QA at the UI test level.
Links to test scenarios and associated test scenario descriptions are exported to Google sheets, where the Requirement Traceability Matrix is then built and maintained, allowing the requirements coverage to now be exposed in a neatly structured way.
All BDD scenarios are tagged with TestRail test case IDs. This allows (by using TestRail API) to create test runs with test results executions in our CI/CD pipeline. And allows easier administration

The process was not sufficient to achieve our desired test coverage early enough:

The feedback cycle between development and testing is still too long as many of the component and UI tests cover a scope of a functional area that is not yet fully developed.
We cannot create too many automated tests, as extending and maintaining the test framework (which actually deserves a separate blog post), would lead to significant additional development costs.

These are the tests you’re looking for

So what about the biggest “base” of the pyramid? Unit tests.

Unit tests are the cheapest ones to make and the easiest to maintain. Furthermore, if something could be tested on unit test level, it is preferable to not test the same on the higher-level tests.

But how could QA control that the right coverage of the business requirements is achieved without taking the responsibility for unit test design away from developers? Ideally, we would somehow include Unit tests into the Requirements Traceability Matrix!

To accommodate the existing test-requirements structure, another level of requirements (system requirements) was defined, created by the QA engineer based on the User Stories that developers implement.

Those system requirements concentrate on functionalities only relevant to the implementation of the system. Specifically, they describe what services, endpoints, or decision points must do (including validations, logic, actions).

Developers now are given the task to design unit tests such that they cover all corresponding system requirements

This setup brought many advantages with it:

Unit tests are now part of a holistic QA process. The QA Engineer is aware of unit test coverage and takes this knowledge into account when designing higher-level tests.
Developers check their tests on a functional level.
The short feedback loop from code to business via QA process.

Extending Requirements Traceability Matrix

In a hierarchical structure, each system requirement (lower hierarchy) can be linked to a business requirement (higher hierarchy)

By mapping Unit tests to the system requirements allow us to obtain business requirement test coverage on all test level:

On the system requirement — unit test level the reporting and implementation process is as follows:

System requirements are written by the QA Engineer in Google Sheets. Each requirement gets a unique ID. Each system requirement is linked to a business requirement
In each Jira ticket describing a user story or feature description, the QA engineer lists the system requirement ids that must be covered by unit tests associated with the implementation of the ticket.
The developer implements unit tests that must provide coverage of those system requirements.

Using the TestRail API again we’ve built another integration between JUnit and TestRail:

Each unit test that covers (fully or partially) a certain requirement gets an annotation @Requirement(“<Requirement ID>“)

Based on that annotation unit tests are auto-generated in TestRail with a link to a requirement

TestRail now has become the source of all test scenarios, which as described above, exports those scenarios to the Requirements Traceability Matrix overview.

As a result, the QA team gains control — at a very early stage — of proper coverage at the unit test level in an Agile way. Per ticket or per story we get fast feedback already at the stage of development, lowering the risks of misalignments and allowing us to change/improve/discuss any gaps, edge cases, or anomalies with the business stakeholders.

Because the QA now has a deep understanding of the specific coverage on the unit test level, it allows the QA to focus solely on use cases and complex components integration on the component test level.

What do the numbers say about coverage?

At Picnic we use SonarCloud to measure code coverage. Below is the view on coverage for the WCS project:

As you can see line coverage and the condition coverage are pretty good for the entire project. However, our logic layer condition coverage is at 91.1%.

Every QA knows that these criteria are not sufficiently comprehensive in defining the quality of the system, but it is a good and simple indication of the quality of unit tests in particular.

By combining these results with our requirements coverage we can be confident that we are on the right track!