A/B testing is a bit like the Olympics. You always go for gold.
Ideally, you’re looking for Michael Phelps-style wins: long-term, consecutive, and (seemingly) effortless. And if you fail to make it to the top-level podium, you can examine your training and performance—that is, strategy and process—to see what went wrong.
Research suggests that bronze Olympic medalists are happier with their achievements than silver medalists because the latter are focused on almost winning gold.
Maybe it’s a similar dynamic in testing: A flat test can be more disappointing than a clear loss because it generates few actionable insights. And repeated flat tests can leave you feeling discouraged … and completely stumped.
So are flat tests always bad? Can you learn anything from them? When you encounter one, what should you do? The Brooks Bell team weighs in.
“When a test comes up flat, I think the best route to take is to revisit the hypothesis of the test as well as the execution of the test.”
-Suzi Tripp, Senior Director, Experimentation Strategy at Brooks Bell
Flat tests are tough because, at least on the surface, they don’t help you determine where to go next. That’s when you go back to the beginning, says Suzi Tripp, our senior director of experimentation strategy.
Looking at strategy and process may help you determine if it’s the idea that customers were indifferent to or the execution of the idea.
You also have to consider that sometimes customers are going to buy no matter what you do, she says. Your test, for example, may have zero effect on highly motivated holiday shoppers.
In the end, you shouldn’t get too discouraged, she says. “The pro of a flat test is that it wasn’t bad for your revenue.”
“If you’re using testing to mitigate risk, a flat test is actually a good sign.”
-Jeremy Andrews, Senior Optimization Engineer at Brooks Bell
Senior optimization engineer Jeremy Andrews agrees that flat tests aren’t all bad. If you’re looking to launch a site redesign or add new functionality, A/B testing is indispensable—and a flat test could essentially be a green light for making changes.
“You can be assured that there won’t be major revenue loss as a result of your changes,” he says.
Jeremy also points out that you may not have the flexibility or time to run a test to statistical significance, but your test may still uncover certain trends.
For example, you may make a change in the hero section with the goal of driving revenue. It may not increase sales, but it could end up driving more visitors to the blog or increasing PDP views. Insights like this can help you understand more about customer behavior.
“Flat tests are frustrating, but sometimes you learn that your hypothesis just didn’t bear out—and that’s OK to learn.”
-Dave Rose, Director, Optimization Consulting – Analytics at Brooks Bell
Echoing Jeremy’s point about the variables needed to reach statistical significance, Dave Rose points out that mathematically, there are few truly flat tests. If you run a test for long enough, a winner will emerge.
But rather than waste time waiting for clear answers, “we pull the plug before we find out if there’s a difference,” he says.
It’s an opportunity-cost calculation. Calling off a test that will yield mild results opens up the opportunity to run one that’s more fruitful.
So how do you learn from flat test results? “Every test has secondary metrics that work as leading indicators of the primary KPI, so we would investigate the hypothesized effect on the leading indicators.”
For example, a financial services firm may set account openings as their KPI. If the test was flat, they would look to indications that visitors wanted to open an account—for example, account starts. This could help the company pinpoint how, exactly, the test failed to deliver.
“We would also investigate if the performance across different large segments varied in a way that was meaningful,” he says. “If so, we might recommend a more targeted test to a particular segment.” It’s an approach mentioned by another team member.
A test can be statistically flat on the main metric, but the secondary metrics may reveal valuable insights—if the test is designed right.
“We always try to pack in as many secondary metrics as we can to explain how the change we made affected user behavior,” explains Reid Bryant, our VP of data science and analytics.
If your test doesn’t affect the KPI like you wanted, look at secondary metrics. Pay particular attention to certain segments—mobile customers, return visitors—that may be underperforming or overperforming). They can help you inform future tests or iterations.
There’s a probability that the trends you observe are statistical noise, so everything should be validated in follow-up tests.
“You call a test on KPI, but secondary metrics give it context.”
-Claire Schmitt, Senior Director, Optimization Consulting at Brooks Bell
Claire Schmitt agrees that even if a test is flat, you can have plenty of valuable insights. Our senior director of optimization consulting, like the others, suggests taking a closer look at the secondary KPIs and segmentation.
“See if there are any insights that can move you in a particular direction for your next test iteration,” she says.
If you have a recurring problem with flat tests, it’s time to change things up, she says. Two possible approaches:
- Go big and try something “upside down and purple.” You won’t know the effect of individual components, but their interaction may get you a dramatic result.
- Focus more on qualitative research; the learnings may lead you down a different path.
What approach do you take when a test is flat? Leave your thoughts in the comments below.