The Ultimate Checklist For Performing High-Quality Experiments

2 Apr

“ A poorly designed test can lead to misleading insights, wasted resources, and flawed business decisions. On the other hand, a well-executed experiment provides clarity, uncovers real opportunities, and builds confidence in your strategy. An experimentation checklist ensures your experiments are rigorous, reliable and trustworthy ”.

You might also enjoy these articles:

THIS BLOG POST IS WRITTEN BY A HUMAN :-)

Purpose

The purpose of this document is to outline an information architecture to assist with the accurate and detailed documentation of experimentation planning, design, deployment, analysis and decision-making.

The document aims to highlight key focus areas for experiment design, execution and reporting that need to be addressed to facilitate a high-performance experimentation program, and perform repeatable, reliable and trustworthy experiments.

Data and insights from each detailed master experimentation report can be carved out for organisation communications, reporting, stakeholder presentations etc.

The experimentation reporting document is a live document, evolving and updated during the course of the experiment lifecycle.

Use this document as template to ensure that experiments are conducted in a systematic and structured manner.

Why documenting experiments is important

Documenting your experiments is important for a number of reasons:

Ensures a consistent and standardised approach to conducting all experiments in an organisation
Provides a common language and terminology around experimentation
There is an audit trail and traceability for all experiments
Drives experimentation process efficiencies across the organisation - individual teams not creating cottage industries
Experimentation documentation can be easily stored and shared across the organisation
Engenders strong alignment around experiment hypothesis - hypothesis setting is a team sport
All experiment metrics are declared and documented upfront
Avoids mistakes during test Design and QA (and a need to re-run tests) - reduces the likelihood of human error
Clear direction for post-experimentation analysis
An educational tool to help teams understand all the components required to conduct a high-quality experiment

Overview

The document outlines key considerations for the following focus areas:

Experiment summary
Experiment overview
Experiment owners
Research & evidence
Learning objectives
Hypothesis
Metrics
Statistical planning
Experiment design
Quality assurance
Data analysis & results
Key learnings & next actions

1. Experiment summary

A one-pager summary that provides a snapshot of key inputs into experiment design, outcomes and next steps.

Including:

Customer research
Problem identified
Hypothesis
Test outcomes
Recommendation
Images of test design (Control & Treatment)

2. Experiment overview

A high-level overview of the experiment.

Experiment ID
Test creation date
Title - state the What, not the How (I.e., Improve CTR From Search Results)
Description of test concept
Behavioural heuristic - Clarity, Usability etc.
Growth area - Retention, Acquisition etc.
Audience - All visitors, Spain, Hosts, Drivers etc.
Prioritisation ratings - ICE, PIE, something else etc.
Opportunity size - what is the estimated business opportunity if the solution is successful?
Test location - Checkout, Homepage, Landing Page etc.
Components - Form, Image, Value Proposition etc.
Metric - target primary metric
Risk - Low, Medium, High
Platform - Web, Mobile Android, Mobile iOS etc.
Estimated test time - days / weeks
Estimated design time - hours / days
Estimated build time - hours / days
Status - Idea, Build, Running etc.
Test completion date
Test outcome - Positive, Inconclusive, Negative

3. Experiment owners

Outline the key stakeholders involved in designing and executing the experiment.

Experiment Owner:

Experimentation
Product
Growth
Marketing etc.

Experiment owners are responsible for end-to-end experiment lifecycle management, including sign-off, starting, stopping, monitoring and acting on any alerts.

Key stakeholders:

Design / UX / UI
Data Science
Analytics
Developers
Engineering
Research & Insights
Legal etc. etc.

4. Research & evidence

Provide an overview of the key insights and customer research that have fed into the experiment.

Summarise the key Qualitative and Quantitative insights that have informed the hypothesis
External research
Competitor intelligence
Research sources – links to research / data sources
Past results – are there any priors that inform the experiment (link: analysis, experiment results, research)
Problem statement – This is a [problem or opportunity] because [assumptions about value]
Proposed solution – We believe that [description of testable solution]
Strategic business goals – how does this opportunity link to business strategy, goals and objectives

Checkpoint: Do we need to test this opportunity with an experiment? If no, what other forms of evidence can inform a path forward for decision-making.

5. Hypothesis

Define and declare your hypothesis upfront – the hypothesis should be precise, testable and falsifiable.

Hypothesis - BASED ON [research insight], WE BELIEVE THAT [change X], WILL CAUSE [impact Y]
Prediction – If we [proposed change] to [independent variable/s], then [expected impact] on [dependent variable/s]

6. Learning objectives

What do you hope to learn from performing the experiment? What are your learning objectives?

We want to understand how reframing the Value Proposition on the homepage will impact lead generation
We want to understand which customer segments find the incentive scheme most compelling
We want to understand how adding a new pricing tier impacts conversions to existing plans for new users

Think about the following:

Impacts to customer experience
Impacts to business performance
Unintended consequences or downstream/upstream impacts
How learnings will inform your future work

7. Hypothesis

Declare and register the key metrics that you will measure for the experiment across the key metrics categories.

Metric categories:

Primary (# of hours viewed)
Secondary (# of previews watched)
Guardrail (# of subscriber cancellations)
Data quality (data loss)

Pre-register your decision criteria prior to performing the experiment to yield higher quality decisions, faster decisions and to save time on data analysis.

If [insert change] improves [insert key metric] and has positive or null effects on [all other metric categories] then [release the change to 100% of users].

8. Statistical planning

Outline and document the key statistical parameters that inform experiment design and execution.

Baseline traffic volumes on target pages to ascertain if there is a sufficient volume of traffic to warrant performing an A/B test in the first instance. Ideally, this analysis will be conducted prior to commencing detailed experiment design to avoid wasted time and effort.

9. Experiment design

Define and document all the parameters for design and execution of the experiment.

For example:

Audience – target audience / cohorts / allocations
Audience size - how big are the cohorts
Eligibility criteria – New user, Returning user, Language etc.
Platform – web or mobile etc.
Device – Android or iOS etc.
Geo-targeting – Country, State, Location etc.
Duration – the test will run for X Days / X Weeks from DD/MM/YYYY to DD/MM/YYYY (to detect a smaller lift will require a longer runtime)
Experiment dates – Start Date / Finish Date
Start rules – Anytime / After test X ends
Stop rules – Fixed duration / Fixed sample / Sequential
Experiment type – A/B Test, Multivariate Test, A/A Test etc.
Replication – is the test a replication run?
Test type – Superiority / Inferiority
Traffic split – 50/50
Unit of randomisation (E.g., User ID, Device ID, Email recipient)
Assignment criteria - what point a User/Device is bucketed into a Control/Treatment group
Metrics - existing metrics / build new metrics
Experiment design – wireframes / designs / copy (footnote and link to relevant materials)
Variants – Control / Variant description
Interaction check – does the test interact with any other Planned / In-Flight tests?
Risks - identify and mitigate risks (could the experiment have adverse effects we're not measuring?)

10. Quality assurance

This Quality Assurance (QA) testing plan outlines the process to ensure accurate and reliable deployment of an experiment. The goal is to validate experiment setup, mitigate risks, and ensure a seamless user experience while maintaining data integrity. Experiment QA will vary between organisations and the number of resources availalbe for QA activities.

Experiment setup

Check audience targeting
Check experiment duration settings
Check traffic split (50/50)
Check that feature flagging is correctly implemented (if applicable)
Check traffic allocation to Control & Variant groups – correct randomisation and user assignment

Functional testing

Check all variants for design and functionality
Code changes tested for correctness
Test experiment for required user states (New user, Logged in user)
Check cross-browser
Check cross-device

Performance testing

Measure page load speeds for variants – identify any latencies
Check for excessive network / API calls

Data & analytics validation

Metrics are available and can be collected
Event tracking is correctly setup for Control and Variants
Validate data flows into analytics tools – metrics are firing correctly

Security & compliance

Verify compliance with privacy and data regulations

Issue tracking & resolution

Log and track all identified issues and bugs

Implement fixes and retest before full deployment

Sign-off & approvals

QA team signs-off on test results

Experiment owner approves deployment of the test

Sign-off & approvals

Gradual rollout – 10%, 50%, 100% (progressive ramp up)

Monitoring and alerting criteria defined
Monitor key metrics and user engagement to identify any anomalies
Have a contingency plan to Rollback / Stop the test (if required)

11. Data analysis & results

Conduct post-hoc analysis of relevant experiment data and results to understand key learnings and insights from the test.

Compute:

Data quality metrics
Statistical markers – Power, Significance
Sample Ratio Mismatch (SRM)
Evaluation and analysis of experiment Metrics set – Primary, Secondary, Guardrail
Deep-dive Segmentation analysis to detect heterogeneous effects
Outliers and skewed data are identified and addressed (if required)
Visualisation of results - Include relevant screenshots, graphs and tables from post-experiment analysis.

KEY QUESTIONS:

What changes were observed across the metrics set?
What is your interpretation of the results?
Were there any issues encountered that may have impacted the experimentation results?
Were there any unintended consequences of performing this experiment? (Upstream or downstream)
If the Null Hypothesis was accepted, what data do you have to support this claim?
If the Null Hypothesis was rejected, and you accepted the Alternate Hypothesis, what data do you have to support this claim?

12. Key learnings & next actions

Summarise the learnings and insights from the experiment, outlining key decisions and next business actions.

We learned that:

What was our hypothesis?
What was the experiment that we performed?
What did we think was going to happen?
What actually happened?
Why do we think this happened?
Who shares a different perspective? Why?
What are our key learnings?
How does our hypothesis need to be refined and adjusted?

Based on the experiment results, the next actions that we will take are:

How will we apply the learnings from the experiment?
Does the experiment need to be replicated?
What additional experiments are required? How will we iterate?
How is our strategy impacted?
What decisions are required?
Why are our decisions correct?

Insights and learnings from the experiment are stored in a centralised knowledge repository and shared with key stakeholders.

Need help with your next experiment?

Whether you’ve never run an experiment before, or you’ve run hundreds, I’m passionate about coaching people to run more effective experiments.

Are you struggling with experimentation in any way?

Let’s talk, and I’ll help you.

References:

Before you finish...

Did you find this article helpful? Before you go, it would be great if you could help us by completing the actions below.

By joining the First Principles community, you’ll never miss an article.

Help make our longform articles stronger. It only takes a minute to complete the NPS survey for this article. Your feedback helps me make each article better.

Or share this article by simply copy and pasting the URL from your browser.

ExperimentationInnovationGrowthExperimentation ChecklistExperimentation ReportExperimentation ProgramExperimentation TemplateExperiment DesignExperiment Execution

Gavin Bryant

The Ultimate Checklist For Performing High-Quality Experiments

Purpose

Why documenting experiments is important

Overview

1. Experiment summary

2. Experiment overview

3. Experiment owners

4. Research & evidence

5. Hypothesis

6. Learning objectives

7. Hypothesis

8. Statistical planning

9. Experiment design

10. Quality assurance

11. Data analysis & results

12. Key learnings & next actions

Need help with your next experiment?

Before you finish...

Ready to beat the odds?

Succeed with experimentation faster.

The Ultimate Checklist For Performing High-Quality Experiments

Purpose

Why documenting experiments is important

Overview

1. Experiment summary

2. Experiment overview

3. Experiment owners

4. Research & evidence

5. Hypothesis

6. Learning objectives

7. Hypothesis

8. Statistical planning

9. Experiment design

10. Quality assurance

11. Data analysis & results

12. Key learnings & next actions

Need help with your next experiment?

Before you finish...

The Experimentation Flywheel: Growing A/B Testing Momentum at Scale

Ready to beat the odds?

Succeed with experimentation faster.