A/B testing: Why companies love it and how data scientists can get it right
Across almost any industry these days, people are focusing on the power of data for decision-making within businesses. A/B testing is a widely-used technique in the field of digital marketing.
Across almost any industry these days, people are focusing on the power of data for decision-making within businesses. A/B testing is a widely-used technique in the field of digital marketing.
The technique has been around for a long time and can be very powerful for revealing weaknesses in a marketing strategy and helping companies find better ways of capturing new customers or increasing conversion rates.
At Faculty, one of the most exciting projects I’ve worked on to date involved helping a marketing company ensure their approach to A/B testing was robust and scalable across different types of clients. One thing was clear: without data or the right kind of analysis, it’s difficult to know how effective your marketing campaigns are. That’s where A/B testing comes in.
A/B testing provides a way of comparing two versions of a marketing asset – such as an advertising campaign, web page, or customer offer. It works by showing different versions of the asset to randomly allocated groups of users and then conducting statistical analysis to find which one leads to the best outcome for the business.
This can be an extremely useful technique, but there are many complexities that arise with A/B testing. To ensure that the conclusions we draw from our data are both correct and robust, we need to follow three key steps.
1.Figure out what data we need, and what data has already been collected
In order to test whether an alternative design of some marketing collateral leads to better business outcomes, you’ll need to decide on a metric to use for judging the performance (conversion, amount spent, or purchase made, for example). Then you need to create the versions of these collateral and collect data about the people who interact with each version.
Let’s take an example we worked on recently, where an offer is shown to customers after they make a purchase on a website. The customers who come to the website are randomly assigned into one of two groups. In group 1, they are shown a “15% off” discount code, and in group 2, they are shown nothing.
Now, do we care about how much customers spend on average, 30 days after seeing an offer? Or do we just care about whether they make a repeat purchase at all? The most suitable metric is likely to be one that looks at the best long-term outcome for your business.
2. Produce results from a representative sample of people
We want to make sure we test the versions on enough people that we believe the result would be true if we tested it on our whole population or customer base, but we also don’t want to spend months and months testing two versions of something if one is clearly not working!
If we generate confidence intervals (estimates around an average value), we can compute the probabilities that the results of an A/B test lie in a particular range and use these estimates to make comparisons between different versions.
3. The groups must be split randomly to be able to make any conclusions about causality
Sometimes, it’s difficult to know whether the differences we detect between groups is due to the difference in what we showed them, or due to chance.
In reality, there are many different reasons why people react the way they do to an advertising campaign or email offer. We need to randomise the assignment of people to each group to reduce the likelihood that other factors will drive the results. Just imagine if you decided to split without randomising your groups, and you had an older group of people and a younger one – when you came to analyse your results, you wouldn’t know if it was someone’s age or the email they were shown that was affecting their click-through rate.
If A/B testing has been around for such a long time, how is data science helping us do it better?
Whether it’s credit card companies tracking our spending behaviour or online retailers tracking our mouse movements and clicks on their website, there’s undoubtedly a lot of data out there. Data science helps us take this data and get it into a format suitable for processing and using in experiments, so companies can make decisions based on it.
If you can run A/B tests as part of an automated pipeline, where you can iterate your designs and create new experiments, then you can continue to test and adjust your marketing collateral as seasons, styles, and your customer base change over time. With data science tools, we can model the distributions of your performance metric just once, and then you are free to iterate, derive insights quickly and act upon them.
In our next blog post, my colleague Ruth will help you understand how A/B testing works in practice, and dive deep into the Bayesian approach to this problem, which allows us to navigate some of the difficulties, including early peeking!
In the meantime, please get in touch if you are interested to hear how Faculty can help you run an A/B test successfully.