Business Experiments with R. B. D. McCullough
visitors of an iconic clothing retailer to induce them to sign up for the retailer's mailing list. The rationale for the test was that these visitors were already at the website and knew about the store and its products, so maybe a monetary inducement was unnecessary. If, indeed, it was unnecessary, then the $10 coupon would be just giving money away needlessly. Visitors to the website are randomly shown one of the two ads. The two groups are typically labeled “A” and “B,” thus the name “A/B testing.” Digital analytics software allows website owners to track the online behavior of visitors in each group, such as what customers click on, what files they download, and whether they make a purchase, allowing comparison between the two groups. In this case, the software tracked whether a visitor signed up for the mailing list or not. A test like this will typically run for a few days or weeks, until enough users have visited the page so that we have a good idea of which version is performing better. Once we have the results of the test, the retailer can deploy the better ad to all visitors. In this case, over a 30‐day period, 400 000 visitors were randomly assigned to see one of the two ads. Do you think a $10 coupon really mattered to people who spent hundreds of dollars on clothes?
In this test, the $10 incentive really did make a difference and resulted in more sign‐ups. While it may not be surprising that the version with the $10 incentive won the test, the test gives us a quantitative estimate of how much better this version image performs: it increased sign‐ups by 300% compared with the version without the incentive. The reason tests like this have become so popular is that they allow us to measure the causal impact of the landing page version on sales. The landing pages were assigned to users at random, and when we average over a large number of users and see a difference between the A users and the B users, the resulting difference must be due to the landing page and not anything else. We'll discuss causality and testing more in Chapter 3.
Figure 1.4 A/B test for mailing list sign‐ups.
Source: courtesy GuessTheTest.com.
Website A/B testing has become so popular that nearly every large website has an ongoing testing program, often conducting dozens of tests every month on every possible feature of the website: colors, images, fonts, text copy, layouts, rules governing when pop‐ups or banners appear, etc. Organizations such as GuessTheTest.com regularly feature examples of tests and invite the reader to guess which version of a website performed better. (The example in Figure 1.4 was provided by GuessTheTest.com.) In Figures 1.5–1.7, we give three more example website tests where users were randomly assigned to see one of two different versions of a website. As you read through them, try to guess which test performed better or whether they were the same.
Website tests can also span across multiple pages in a site. For example, an online retailer wanted to know how best to display images of skirts on their website. Should the skirt be shown as part of a complete outfit (left image in Figure 1.5), or should the image of the skirt be shown with the model's torso and face cropped out to better show the details of the skirts? In this test, users were assigned to one of the two treatments and then shown either full or cropped images for every skirt on the product listing pages. (Doing this requires a bit more setup than the simple one‐page tests but is still possible with most testing software.) The website analytics software measured the sales of skirts (total revenue in $) for the two groups. Which images do you think produced more skirt sales?
Figure 1.5 Skirt images test.
Source: photograph by Victoria Borodinova.
As mobile websites and apps have become more popular, website owners have also conducted tests on mobile devices. Figure 1.6 shows two different versions of a mobile webpage where users can find information about storage locations near them. The version on the left in Figure 1.6 directs the user to enter his zip code and then press a button to search for nearby locations. The version on the right lets the user employ his current GPS location to look up locations nearby. The test measured how many customers signed up to visit a location and how many customers actually rented a storage unit. Which version do you think would get more customers to visit a physical location and to rent?
Figure 1.6 Mobile landing page test for storage company.
Our last example shows a test to determine whether it is beneficial to include a video icon on the product listing to indicate that there is a video available for the product. The images in Figure 1.7 show a product listing without the icon (left) and with the icon (right). These images appear on the product listing page that shows all the products in a particular category (e.g. dresses, tops, shoes). In this test, users were assigned to either never see the video icons or to see the video icons for every product that had a video available. The two groups were compared based on the percentage of sessions that viewed a product detail page, which is the page the user sees when she clicks on one of the product listing images. The hypothesis was that the icons would encourage more people to click to see the product details where they can view the video. They also measured the total sales ($) per session. Do you think the icons will encourage users to click through to the product page?
Figure 1.7 Video icon test.
Source: Elias de Carvalho/Pexel.
Here we have shown four examples of website tests, but the options for testing websites and other digital platforms like apps or kiosks are nearly limitless. The growth in website testing has been driven largely by software that manages the randomization of users into test conditions. Popular software options include Optimizely, Maxymiser, Adobe Test&Target, Visual Website Optimizer, and Google Experiments. These tools integrate with digital analytics software like Google Analytics or Adobe Analytics that track user behavior on websites, which provides the data to compare the two versions. Most major websites will have testing software installed and often have a testing manager whose job is to plan, conduct, analyze, and report the results of tests. Newer software tools also make it possible to run tests on email, mobile apps, and other digital interfaces like touch‐screen kiosks or smart TV user interfaces. These website tests represent the ideal business experiment: we typically have a large sample of actual users, users are randomly assigned to alternative treatments, the user behavior is automatically measured by the web analytics solution, and the comparison between the two groups is a good estimate of the causal difference between the treatments. It is also relatively easy to implement