The Experimentation Flywheel: Growing A/B Testing Momentum at Scale
“Experimentation teams must focus on doing the basics right, performing simple, high-quality experiments often. This is the winning recipe for the experimentation flywheel to keep spinning faster and faster. Introducing too much complexity can make it difficult to detect learnings due to errors, or very difficult analyses”.
You might also enjoy these articles:
THIS ARTICLE IS WRITTEN BY A HUMAN :-)
Introduction
Aleksander Fabijan is Senior Product Manager at Microsoft.
He is part of a 60+ team of Data Scientists, Software Engineers and Program Managers that form the Experimentation Platform (ExP) team.
The Microsoft ExP team is responsible for one of the largest and most cutting-edge online experimentation systems in the industry.
Bing, MS Office, Skype, Xbox, Windows, and MS Teams all use the ExP platform to perform online controlled experiments.
Aleksander holds a PhD from Malmo University where he conducted his research at Software Centre, Europe. He has published and co-authored many influential research papers on large-scale online controlled experimentation.
His passions include data-driven decision-making using A/B testing, AI system design, or API design, User Experience (UX) and large-scale experimentation.
In this article we discuss:
The impacts of A/B Testing
Overcoming experimentation challenges
A quick overview of the Experimentation Flywheel
The Value-Investment Flywheel
How to get quick runs on the board
When experiment design and execution go wrong
How to develop layers of experimentation champions
1. The impacts of A/B Testing
Almost all executives and boards are aware of the benefits and impacts of experimentation on product development and decision-making.
Experimentation is a powerful tool. A powerful mechanism to make the right decisions.
However, business decisions must be made against metrics, and the right metrics.
Teams need to be careful that they don’t fall into the trap of continually optimising for Local Maxima and short-term improvements.
Instead, we should be seeking to evaluate existing user experience, understanding how customers can have a better experience with the product, and measure that.
“Experimenters need to ensure that they are measuring the right metrics, and the chosen metrics have a strong correlation, or causation, to long-term user success”.
2. Overcoming experimentation challenges
Struggle starts with motivation
The biggest struggle that teams can have with onboarding to experimentation is motivation.
Why would anyone seek to perform experiments if there is too much friction in the process? When this is the case, it can produce resistance to experimentation, thwarting penetration in an organisation.
A good way to overcome initial resistance and hesitancy to experimentation is to communicate with new teams:
Share examples of previous experiments
Share key learnings and insights from other teams
Highlight the successes of other teams
Articulate benefits through real-world examples
Embed experimentation to existing product development processes
Designing, executing, and analysing experiments needs to be as easy and fast as possible.
Organisations need to ensure that the “human cost of experimentation” is low. No one wants to pay the Experimentation Tax.
If the Experimentation Tax is too high, experimentation uptake will be low across the organisation.
“To ensure that experimentation is as smooth as possible, experimentation must be an integrated part of the product development process or release cycle”.
Experimentation should not be a separate process or flow that is conducted in parallel to the product development process.
Experiment design, analysis and alerting should be embedded in existing workflow processes.
Making trade-offs
Friction doesn’t only exist in experimentation processes; it also exists in results analysis.
“When it comes time to review experimentation results, there’s friction due to the need to assess and decide on trade-offs”.
This is why it’s super important to ensure that you’re always measuring the right metrics.
Otherwise, it’s incredibly difficult to balance business trade-offs if you don’t understand what’s important.
3. An overview of the Experimentation Flywheel
The A/B Testing Flywheel (Fabijan, Vermeer, Dmitriev, 2021)
There are five steps which comprise the Experimentation Flywheel:
Running more A/B tests to support more decisions
Measure value to decision-making
Increasing interest in A/B testing
Investing in A/B testing infrastructure and data quality
Lowering (human) manual cost of A/B testing
1. Running more A/B tests to support more decisions
At the top of the flywheel is the goal – using A/B tests to support decision making. With every turn of the flywheel, we aim to run more A/B tests to support more decisions.
2. Measure value to decision-making
The impact and value added by A/B tests for customers and business need to be measured and captured. The more A/B tests we run with each turn of the flywheel, the more aggregate value and impact the A/B testing program will provide. If the value of A/B testing is unclear or negative value signals are sent, such as when executives ignore insights from A/B tests, or when employees complain about excessive amounts of time spent configuring and executing an A/B test, it becomes hard to generate interest and make justifications for more resources.
3. Increasing interest in A/B testing
When A/B testing delivers value for one team, this can lead to increased interest in A/B testing by new teams. To support this spread of interest, dedicated efforts are needed to communicate the value of A/B tests as broadly as possible. Additionally, educational and support efforts are needed to help the individuals and teams who show interest make their initial experience with A/B testing a success.
4. Investing in A/B testing infrastructure and data quality
The more interest and willingness there is to try A/B testing within an organisation, the more resources can be justifiably allocated to make an A/B testing program successful. These resources should be directed towards two key areas: improving A/B testing platform capabilities for managing and analyzing A/B tests, and increasing data quality. If we do not have a reliable and trustworthy platform and data, there are a host of issues that can arise, leading to erosion of trust in A/B testing.
5. Investing in A/B testing infrastructure and data quality
Improvements in A/B testing infrastructure and data quality create conditions for streamlining the A/B testing process and lowering the cost of manual work i.e., the amount of investment required to start doing A/B testing in a new area. Continuously lowering the cost of A/B testing is a critical step in the flywheel that is often missed. If running A/B tests remains costly, A/B testing will remain limited to highly interested early adopters with a lot of resources.
Paper - It Takes a Flywheel to Fly: Kickstarting and Growing A/B Testing Momentum at Scale
If you’re enjoying this article, listen to the full conversation on the Experimentation Masters Podcast
4. The Value-Investment flywheel
The Value – Investment Flywheel is a critical consideration for all experimentation programs.
This is particularly important in the early stages of establishing and growing an experimentation program.
All experimentation programs, large or small, must consistently communicate and quantify the value being derived from business investment in the program.
“When an organisation continues to extract value from its investment in the experimentation program, the program receives additional funding to grow and scale, delivering more benefits for the organisation in the future”.
And so, the cycle continues over many years … Experimentation program delivers value > business commits more investment > Experimentation program delivers more value > business commits more investment etc. etc.
If the experimentation program fails to demonstrate and deliver business value, investment can be wound back, or, in extreme cases, funding for the program stopped altogether.
The Value-Investment Flywheel
Four ways to quantify business value for your experimentation program:
Program
Platform
Product
Team
Program:
Conduct meta-analysis to evaluate the impacts of A/B Tests and experimentation efforts on company growth models.
Platform:
Implement platform SLA’s so that you can measure how your experimentation platform is performing.
Product:
Teams can quantify the positive impact of their work on user experience. Conversely, what were the gains made by not pursuing opportunities that negatively impacted user experience.
Team:
Measure improvements to decision velocity and product velocity. Teams can clearly understand the impacts to key metrics, and the path forward, enabling them to create and deliver value faster.
5. How to get quick runs on the board
For any new experimentation program, getting quick runs on the board is critical. It’s a case of survival of the fittest.
Early days can be make, or break, for the success and longevity of an experimentation program.
Getting some early runs on the board will increase trust and confidence in the program, also providing the experimentation team with some much-needed breathing space.
Three simple steps for a 100% success rate for experimentation quick wins:
Visibility
Simplicity
Measurement
Visibility:
Choose opportunities where there are strongly held beliefs (particularly by senior leaders), there is conjecture or disagreement, and people care about the outcomes of the experiments.
Simplicity:
There is a simple, easy way to test the opportunity. Choose low-complexity, easy to execute A/B tests to begin. Avoid complex experimentation techniques, back-end changes or large-scale UX redesigns etc.
Measurement:
There is a suite of metrics that is readily available to evaluate the performance of early A/B tests. For example, you don’t want to be building data pipelines and undertaking complex system integrations. Results analysis needs to be straightforward. You want to be able to easily connect the impact to your metrics back to the new change you’ve made.
“Experimentation teams must focus on doing the basics right, performing simple, high-quality experiments often. This is the winning recipe for the experimentation flywheel to keep spinning faster and faster”.
6. When experimentation design and execution go wrong
In an experiment there’s typically two areas where things can go wrong:
Design
Execution
Design:
The biggest mistake that occurs during experiment design is that the test does not have enough Power. You’re not including enough instances, users or devices. Maybe you don’t have a stable unit that you’re randomising on?
Execution:
On the execution side, it all comes down to the telemetry you’re looking at - how the data is stitched together and combined to form a meaningful metric.
While there’s a ton of things that can potentially go wrong in data pipelines during execution, Sample Ratio Mismatch is the most common problem.
Sample Ratio Mismatch can help to identify causes of some of the more common errors and bugs that occur during execution.
“Sample Ratio Mismatch is our ultimate detector of the problems that we can have in A/B tests”.
Guardrail metrics:
When things do go wrong, you need to ensure that you have appropriate guardrails and monitoring in place to detect negative events. Sample Ratio Mismatch is one such guardrail.
“Every single day I see A/B tests with Sample Ratio Mismatch, due to data that is impacted in some way by the new variant”.
7. Developing layers of experimentation champions
Experimentation is no different to any other project or initiative. It requires support within the organisation.
All levels of experimentation champions have different roles and responsibilities. However, they are as equally important as one another.
There are three levels of support required:
Executive sponsorship
Strategic liaison
Ground support
Executive Sponsorship:
Executive sponsorship is required to provide advocacy, investment, resource and help remove blockers. At a minimum, there needs to be at least one executive sponsor to support the experimentation program.
Strategic Liaison:
Strategic Liaison communicate between Ground Support, Executive Sponsors and Platform Sponsors on cross-platform experience issues and opportunities. Effectively a Product Manager type, the Strategic Liaison identifies problems and issues teams are facing.
The Strategic Liaison works strategically to develop a roadmap of solutions and required investments to make life easier for Ground Support and Business Teams, reducing friction and lowering the human cost of experimentation.
Ground Support:
Ground Support provide day-to-day, hands-on operational support to Teams at the coalface of experimentation. Ground Support will be a domain expert (I.e. Marketing), understanding the experimentation specific context within their business unit / cross-functional team.
Ground Support will assist with education, experiment design and execution, issues resolution, troubleshooting etc.
Summary
The winning recipe for experimentation in your organisation is to get the flywheel spinning faster and faster.
Ensure that you aim to decrease the human cost of experimentation wherever possible in your flywheel to reduce friction.
Think about some of these things:
Increase the motivation of nascent teams to onboard to experimentation
Make experimentation as simple and easy as possible for users
Continually quantify and communicate the value of the experimentation program
Get exceptionally good at performing simple, high-quality A/B tests
Establish multiple layers of Champions to support the program
Identify Micro flywheels that amplify and drive the speed of your Macro flywheel
Need help with your next experiment?
Whether you’ve never run an experiment before, or you’ve run hundreds, I’m passionate about coaching people to run more effective experiments.
Are you struggling with experimentation in any way?
Let’s talk, and I’ll help you.
References:
Before you finish...
Did you find this article helpful? Before you go, it would be great if you could help us by completing the actions below.
By joining the First Principles community, you’ll never miss an article.
Help make our longform articles stronger. It only takes a minute to complete the NPS survey for this article. Your feedback helps me make each article better.
Or share this article by simply copy and pasting the URL from your browser.