The Ultimate Checklist For Performing High-Quality Experiments
“ A poorly designed test can lead to misleading insights, wasted resources, and flawed business decisions. On the other hand, a well-executed experiment provides clarity, uncovers real opportunities, and builds confidence in your strategy. An experimentation checklist ensures your experiments are rigorous, reliable and trustworthy ”.
You might also enjoy these articles:
THIS BLOG POST IS WRITTEN BY A HUMAN :-)
Purpose
The purpose of this document is to outline an information architecture to assist with the accurate and detailed documentation of experimentation planning, design, deployment, analysis and decision-making.
The document aims to highlight key focus areas for experiment design, execution and reporting that need to be addressed to facilitate a high-performance experimentation program, and perform repeatable, reliable and trustworthy experiments.
Data and insights from each detailed master experimentation report can be carved out for organisation communications, reporting, stakeholder presentations etc.
The experimentation reporting document is a live document, evolving and updated during the course of the experiment lifecycle.
Use this document as template to ensure that experiments are conducted in a systematic and structured manner.
Why documenting experiments is important
Documenting your experiments is important for a number of reasons:
Ensures a consistent and standardised approach to conducting all experiments in an organisation
Provides a common language and terminology around experimentation
There is an audit trail and traceability for all experiments
Drives experimentation process efficiencies across the organisation - individual teams not creating cottage industries
Experimentation documentation can be easily stored and shared across the organisation
Engenders strong alignment around experiment hypothesis - hypothesis setting is a team sport
All experiment metrics are declared and documented upfront
Avoids mistakes during test Design and QA (and a need to re-run tests) - reduces the likelihood of human error
Clear direction for post-experimentation analysis
An educational tool to help teams understand all the components required to conduct a high-quality experiment
Overview
The document outlines key considerations for the following focus areas:
Experiment summary
Experiment overview
Experiment owners
Research & evidence
Learning objectives
Hypothesis
Metrics
Statistical planning
Experiment design
Quality assurance
Data analysis & results
Key learnings & next actions
1. Experiment summary
A one-pager summary that provides a snapshot of key inputs into experiment design, outcomes and next steps.
Including:
Customer research
Problem identified
Hypothesis
Test outcomes
Recommendation
Images of test design (Control & Treatment)
2. Experiment overview
A high-level overview of the experiment.
Experiment ID
Test creation date
Title - state the What, not the How (I.e., Improve CTR From Search Results)
Description of test concept
Behavioural heuristic - Clarity, Usability etc.
Growth area - Retention, Acquisition etc.
Audience - All visitors, Spain, Hosts, Drivers etc.
Prioritisation ratings - ICE, PIE, something else etc.
Opportunity size - what is the estimated business opportunity if the solution is successful?
Test location - Checkout, Homepage, Landing Page etc.
Components - Form, Image, Value Proposition etc.
Metric - target primary metric
Risk - Low, Medium, High
Platform - Web, Mobile Android, Mobile iOS etc.
Estimated test time - days / weeks
Estimated design time - hours / days
Estimated build time - hours / days
Status - Idea, Build, Running etc.
Test completion date
Test outcome - Positive, Inconclusive, Negative
3. Experiment owners
Outline the key stakeholders involved in designing and executing the experiment.
Experiment Owner:
Experimentation
Product
Growth
Marketing etc.
Experiment owners are responsible for end-to-end experiment lifecycle management, including sign-off, starting, stopping, monitoring and acting on any alerts.
Key stakeholders:
Design / UX / UI
Data Science
Analytics
Developers
Engineering
Research & Insights
Legal etc. etc.
4. Research & evidence
Provide an overview of the key insights and customer research that have fed into the experiment.
Summarise the key Qualitative and Quantitative insights that have informed the hypothesis
External research
Competitor intelligence
Research sources – links to research / data sources
Past results – are there any priors that inform the experiment (link: analysis, experiment results, research)
Problem statement – This is a [problem or opportunity] because [assumptions about value]
Proposed solution – We believe that [description of testable solution]
Strategic business goals – how does this opportunity link to business strategy, goals and objectives
Checkpoint: Do we need to test this opportunity with an experiment? If no, what other forms of evidence can inform a path forward for decision-making.
5. Hypothesis
Define and declare your hypothesis upfront – the hypothesis should be precise, testable and falsifiable.
Hypothesis - BASED ON [research insight], WE BELIEVE THAT [change X], WILL CAUSE [impact Y]
Prediction – If we [proposed change] to [independent variable/s], then [expected impact] on [dependent variable/s]
6. Learning objectives
What do you hope to learn from performing the experiment? What are your learning objectives?
We want to understand how reframing the Value Proposition on the homepage will impact lead generation
We want to understand which customer segments find the incentive scheme most compelling
We want to understand how adding a new pricing tier impacts conversions to existing plans for new users
Think about the following:
Impacts to customer experience
Impacts to business performance
Unintended consequences or downstream/upstream impacts
How learnings will inform your future work
7. Hypothesis
Declare and register the key metrics that you will measure for the experiment across the key metrics categories.
Metric categories:
Primary (# of hours viewed)
Secondary (# of previews watched)
Guardrail (# of subscriber cancellations)
Data quality (data loss)
Pre-register your decision criteria prior to performing the experiment to yield higher quality decisions, faster decisions and to save time on data analysis.
If [insert change] improves [insert key metric] and has positive or null effects on [all other metric categories] then [release the change to 100% of users].
8. Statistical planning
Outline and document the key statistical parameters that inform experiment design and execution.
Baseline traffic volumes on target pages to ascertain if there is a sufficient volume of traffic to warrant performing an A/B test in the first instance. Ideally, this analysis will be conducted prior to commencing detailed experiment design to avoid wasted time and effort.
9. Experiment design
Define and document all the parameters for design and execution of the experiment.
For example:
Audience – target audience / cohorts / allocations
Audience size - how big are the cohorts
Eligibility criteria – New user, Returning user, Language etc.
Platform – web or mobile etc.
Device – Android or iOS etc.
Geo-targeting – Country, State, Location etc.
Duration – the test will run for X Days / X Weeks from DD/MM/YYYY to DD/MM/YYYY (to detect a smaller lift will require a longer runtime)
Experiment dates – Start Date / Finish Date
Start rules – Anytime / After test X ends
Stop rules – Fixed duration / Fixed sample / Sequential
Experiment type – A/B Test, Multivariate Test, A/A Test etc.
Replication – is the test a replication run?
Test type – Superiority / Inferiority
Traffic split – 50/50
Unit of randomisation (E.g., User ID, Device ID, Email recipient)
Assignment criteria - what point a User/Device is bucketed into a Control/Treatment group
Metrics - existing metrics / build new metrics
Experiment design – wireframes / designs / copy (footnote and link to relevant materials)
Variants – Control / Variant description
Interaction check – does the test interact with any other Planned / In-Flight tests?
Risks - identify and mitigate risks (could the experiment have adverse effects we're not measuring?)
10. Quality assurance
This Quality Assurance (QA) testing plan outlines the process to ensure accurate and reliable deployment of an experiment. The goal is to validate experiment setup, mitigate risks, and ensure a seamless user experience while maintaining data integrity. Experiment QA will vary between organisations and the number of resources availalbe for QA activities.
Experiment setup
Check audience targeting
Check experiment duration settings
Check traffic split (50/50)
Check that feature flagging is correctly implemented (if applicable)
Check traffic allocation to Control & Variant groups – correct randomisation and user assignment
Functional testing
Check all variants for design and functionality
Code changes tested for correctness
Test experiment for required user states (New user, Logged in user)
Check cross-browser
Check cross-device
Performance testing
Measure page load speeds for variants – identify any latencies
Check for excessive network / API calls
Data & analytics validation
Metrics are available and can be collected
Event tracking is correctly setup for Control and Variants
Validate data flows into analytics tools – metrics are firing correctly
Security & compliance
Verify compliance with privacy and data regulations
Issue tracking & resolution
Log and track all identified issues and bugs
Implement fixes and retest before full deployment
Sign-off & approvals
QA team signs-off on test results
Experiment owner approves deployment of the test
Sign-off & approvals
Gradual rollout – 10%, 50%, 100% (progressive ramp up)
Monitoring and alerting criteria defined
Monitor key metrics and user engagement to identify any anomalies
Have a contingency plan to Rollback / Stop the test (if required)
11. Data analysis & results
Conduct post-hoc analysis of relevant experiment data and results to understand key learnings and insights from the test.
Compute:
Data quality metrics
Statistical markers – Power, Significance
Sample Ratio Mismatch (SRM)
Evaluation and analysis of experiment Metrics set – Primary, Secondary, Guardrail
Deep-dive Segmentation analysis to detect heterogeneous effects
Outliers and skewed data are identified and addressed (if required)
Visualisation of results - Include relevant screenshots, graphs and tables from post-experiment analysis.
KEY QUESTIONS:
What changes were observed across the metrics set?
What is your interpretation of the results?
Were there any issues encountered that may have impacted the experimentation results?
Were there any unintended consequences of performing this experiment? (Upstream or downstream)
If the Null Hypothesis was accepted, what data do you have to support this claim?
If the Null Hypothesis was rejected, and you accepted the Alternate Hypothesis, what data do you have to support this claim?
12. Key learnings & next actions
Summarise the learnings and insights from the experiment, outlining key decisions and next business actions.
We learned that:
What was our hypothesis?
What was the experiment that we performed?
What did we think was going to happen?
What actually happened?
Why do we think this happened?
Who shares a different perspective? Why?
What are our key learnings?
How does our hypothesis need to be refined and adjusted?
Based on the experiment results, the next actions that we will take are:
How will we apply the learnings from the experiment?
Does the experiment need to be replicated?
What additional experiments are required? How will we iterate?
How is our strategy impacted?
What decisions are required?
Why are our decisions correct?
Insights and learnings from the experiment are stored in a centralised knowledge repository and shared with key stakeholders.
Need help with your next experiment?
Whether you’ve never run an experiment before, or you’ve run hundreds, I’m passionate about coaching people to run more effective experiments.
Are you struggling with experimentation in any way?
Let’s talk, and I’ll help you.
References:
Before you finish...
Did you find this article helpful? Before you go, it would be great if you could help us by completing the actions below.
By joining the First Principles community, you’ll never miss an article.
Help make our longform articles stronger. It only takes a minute to complete the NPS survey for this article. Your feedback helps me make each article better.
Or share this article by simply copy and pasting the URL from your browser.