Fight with `Flaky` tests or another side of using pytest-xdist

5 min readNov 2, 2021

The image that presents test flakiness in the pipeline

Usually, when your project is growing it affects the number of tests as well. That in turn entails increasing the duration of builds. As result, we start looking at some way of parallelization. In the Python world, the most popular test runner is pytest with a huge bucket of awesome plugins, and pytest-xdist is one of these which is usually used for parallelization.

Emulated long tests without parallelization

The same tests but with parallelization look much better

same tests in parallel with turned on verbose mode

Tests were run parallel with pytest-xdist in 4 nodes, seems we won and reduced runtime in 4 times. Not Bad 🎉

But parallelization has another side of the medal. It also could increase the flaky rate in your project. 😞

Let’s say we got a failed test in CI without any strong changes in the codebase but when it was running locally itself, was green, magic, I don’t think so. 🤔 I bet most of us are faced with that behavior from time to time.

Some times user did not found, but locally always green

Usually, there are just coupled tests that have some dependency between them. Mutable type, mutated data in DB, global variables, etc… The root cause is hidden via pytest-xdist because it runs each time all tests in different nodes. So it’s just a “rare” case but very annoying when some coupled tests were put to the same node in particular order 💥

How to find the root cause of flaky test 🤔

We of course can brute force and check all possible cases. But what if we have 40+ thousand tests. Seems it can take a long time, tons of time. Need to find another way. As I mention above pytest is a very flexible test runner and has many plugins. Let’s write our own plugin for that which allow us to reproduce the issue locally 🎉

Let’s start implementation

The main idea: for reproducing issues with flaky tests in particular pytest-xdist node we should know how it was. So need to do some artifact of all tests which were run in the failed node, keeping tests ordering. And then add the ability to run only tests from this artifact.

CLI of our plugin

So we should define two command line parameters for our plugin:

file path to save artifacts
file-path to run from artifact

This pytest has a hook that works pretty the same as argparse. It will be not hard 😉

Configuration

On this step also will be used pytest hook pytest_configure. It doesn’t contain anything difficult. Just registers particular plugin based on passed params from CLI

From Collector to Runner

Now we should implement two classes for each parameter:

TestTracker — that will collect all tests which were run in a particular node and stores to artifact
TestRunner — that obviously will run tests from artifact

From a pytest perspective, it will be two plugins that will be registered based on passed params from CLI and never run both at the same time.

TestTracker

The main idea here is to get the full test name when it was executed and store it to file.

Let’s describe simple flow:

get a test that should be executed in the current pytest-xdist node
get the full name of the test, here is a tricky case, pytest has parametrization it allows us to reuse the same test with different parameters but these parameters use in making a full name (ex: “tests/path/test_module.py::TestCase::test_one[param1]”). So can be the case when was passed to param “\n” or large text and it will break our artifact. So need to keep in mind that and add a handler for this case
save name to collection only once
once store collection to the file just to avoid spending time on IO

Now we have the main functionality of tracking tests on pytest-xdist nodes. But how do get these tests? 🤔

Thanks to god that pytest also has hooks for that (and much more):

pytest_runtest_call — calls to run the test
pytest_sessionfinish — calls after the whole test run finished
get_xdist_worker_id — helper from pytest-xdist
is_xdist_worker — helper from pytest-xdist

Going to add these hooks to our TestTracker:

Sweet! 😍Now we have artifacts that can help us to reproduce our flaky. Could be enough but what if we have more tests than in example 🤔seems useless in the case when you have 40+ thousands of tests. So TestRunner is important, let’s do that!

TestRunner

Let’s do the same things which we did for TestTracker for that as well. Right! Describe simple flow 😉

get file path from CLI params
read file
parse file
get from pytest all collected tests. there will help pytest hook pytest_collection_modifyitems
filter collected tests by tests from our artifact
sort tests as it was from the previous run

Doesn’t look hard 😝

Now we taught our plugin runs tests from artifacts and doesn’t care how many tests we have inside.

Conclusion

Now we know one more ability to reproduce flaky tests in “endless war of instability”.

And here it is, my current flow to reproduce flaky tests locally. I use CURL for downloading artifacts from failed build in Jenkins and run pytest from the command line with params that we defined above 🎉

BTW: this plugin already exists on PyPi:
pip install pytest-xdist-tracker