a-b | Perfectly Awesome

6 Statistical Methods for A/B Testing in Data Science and Data Analysis

A/B testing is a cornerstone of data science, essential for making informed business decisions and optimizing customer revenue. Here, we delve into six widely used statistical methods in A/B testing, explaining their purposes and appropriate contexts. 1. Z-Test (Standard Score Test): When to Use: This method is ideal for large sample sizes (typically over 30) when the population variance is known. Purpose: Compares the means of two groups to determine if they are statistically different. Applications: This technique is frequently employed in conversion rate optimization and click-through rate analysis. It helps identify whether changes in website elements or marketing strategies

Better A/B testing with survival analysis

Pic by author - using DALL-E 3 When running experiments don’t forget to bring your survival kit I’ve already made the case in several blog posts (part 1, part 2, part 3) that using survival analysis can improve churn prediction. In this blog post I’ll ...

Sample Size Calculator

Visual, interactive sample size calculator ideal for planning online experiments and A/B tests.

Computing Minimum Sample Size for A/B Tests in Statsmodels: How and Why

A deep-dive into how and why Statsmodels uses numerical optimization instead of closed-form formulas

How to use Causal Inference when A/B testing is not available

Evaluating ad targeting product using causal inference: propensity score matching!

B3ed2e05

Caveats and Limitations of A/B Testing at Growth Tech Companies

For non-tech industry folks, an “A/B test” is just a randomized controlled trial where you split users or other things into treatment and control groups, and then later compare key metr…

8 annoying A/B testing mistakes every engineer should know

Running experiments is equal parts powerful and terrifying. Powerful because you can validate changes that will transform your product for the better…

When You Should Prefer “Thompson Sampling” Over A/B Tests

An in-depth explanation of “Thompson Sampling”, a more efficient alternative to A/B testing for online learning

25 A/B Testing Concepts — Interview Cheat Sheet

Questions on A/B testing are being increasingly asked in interviews but reliable resources to prepare for these are still far and few…

Bayesian AB Testing

Using and choosing priors in randomized experiments.

How to Select the Right Statistical Tests for Different A/B Metrics

A Discussion of the go-to methods for 5 Types of A/B Metrics

The Joy of A/B Testing, Part II: Advanced Topics

Cookies and privacy, interleaving experiments, clean dial-ups, and test metrics

Tests | GoodUI

The Joy of A/B Testing: Theory, Practice, and Pitfalls

How today’s tech companies make data-driven decisions in Machine Learning production

Fully Mastering Fisher’s Exact Test for A/B Testing

While Fisher’s exact test is a convenient tool for A/B testing, the idea and results of the test are often hard to grasp and difficult to…

Conservation of Intent: The hidden reason why A/B tests aren’t as effective as they look

23 Tips on How to A/B Test Like a Badass - Search Engine Watch

A/B testing is hitting the mainstream because it is so effective. And with so many tools available it has become very easy and very inexpensive to run. Here are 23 helpful tips on how you can take your A/B tests from basic to the next level.

The golden rule of A/B testing: look beyond validation

A/B tests provide more than statistical validation of one execution over another. They can and should impact how your team prioritizes projects.

A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments | the morning paper

Start here: Statistics for A/B testing

We’re Agile, we think lean, we’re data-driven. If you live in the new economy and work in some sort of digital product you hear some of…

5 Tricks When AB Testing Is Off The Table

An applied introduction to causal inference in tech

Etsy's A/B Testing Culture Spurs Mobile Innovation | Apptimize

We spoke with Etsy’s iOS Software Engineer, Lacy Rhoades, about their culture of continuous experimentation. Learn about their a/b testing culture

Understanding CUPED

An in-depth guide to the state-of-the-art variance reduction technique for A/B tests

Implications of use of multiple controls in an A/B test

Infographic: 26 Ideas For Split Testing Your Search Ads

If you want to always be closing, then you need to always be testing, a long-standing mantra (and title of a popular book) in the search marketing space.

The ultimate guide to A/B testing. Part 1: experiment design

A/B testing is a very popular technique of checking granular changes in a product without mistakenly taking into account changes that were…

Multivariate vs. A/B Testing: Incremental vs. Radical Changes

Multivariate tests indicate how various UI elements interact with each other and are a tool for making incremental improvements to a design.

8 Rules of A/B Testing – The Art in Marketing Science - Search Engine Watch

Data will tell you the right answer. If you can’t find data somewhere, you should run a test, collect the data, and let it tell you what’s right. A/B testing is one of the core marketing arts a marketer should master and practice.

ANALYZE YOUR A/B TEST RESULTS

The best way to determine what works best for your site is to carry out an A/B test for your landing pages. Check out this A/B significant test calculator.

Reforge

Statistical Significance Calculator - FREE AB Test Calculator

Our A/B test calculator will help you to compare two or three variants to determine which test will be statistically significant.

Why You Only Need to Test with 5 Users

Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford.

11 A/B Testing Tools to Optimize Conversions

A/B testing, the process of exposing randomized visitors to one or more variables, is among the most effective strategies to optimize user experiences and conversion rates. Here is a list of A/B testing tools.

Evan's Awesome A/B Tools - sample size calculator, A/B test results, and mo

Why You Should Switch to Bayesian A/B Testing

Statistics & Business can share the same Language

A/B/C Tests: How to Analyze Results From Multi-Group Experiments

Experimentation is widely used at tech startups to make decisions on whether to roll out new product features, UI design changes, marketing campaigns and more, usually with the goal of improving…

Oliver Palmer | You probably don’t need A/B testing

The best way to optimise your website is usually the simplest.

8 Common Pitfalls of Running A/B Tests

How not to fail your online controlled experimentation

A/B Testing — A complete guide to statistical testing

Optimizing web marketing strategies through statistical testing

AB_Testing/AB_Testing.ipynb at main · bjpcjp/AB_Testing

A/B Testing — A complete guide to statistical testing - bjpcjp/AB_Testing

Why you should try the Bayesian approach of A/B testing

The intuitive way of A/B testing. The advantages of the Bayesian approach and how to do it.

The ultimate guide to A/B testing. Part 4: Bayesian approach (binomial vari

A/B testing is a very popular technique for checking granular changes in a product without mistakenly taking into account changes that…

To Get More Replies, Say Less

This is a story of how a software company was able to start a conversation with 8x more of their users by cutting the length of their emails by 90%. You could set up a test of this method in less than an hour. The Problem One of the most

Home

A/B Test Statistics Made Easy

Part 2: Proportion Metrics

Beyond A/B Testing: Primer on Causal Inference

Making the most out of your experiments and observational data

I've Built Multiple Growth Teams. Here's Why I Won't Do It Again. | CXL

Big success. Bigger failure. And lots of lessons. Learn why building a growth team may be a multi-million dollar mistake.

32,487 A/B tests conducted by Upworthy from January 2013 to April 2015

197K subscribers in the datasets community. A place to share, find, and discuss Datasets.

How Etsy Handles Peeking in A/B Testing - Code as Craft

Etsy relies heavily on experimentation to improve our decision-making process. We leverage our internal A/B testing tool when...

25 Ecommerce A/B Testing Ideas For Your 5 Top Store Pages

The biggest question in ecommerce A/B testing is not “how.”

Comparing A/B and Multivariate Testing

A/B tests are controlled experiments of two attributes, to measure which one was most popular with users. You can apply A/B testing to just about anything that you can measure. Multivariate testing allows you to measure multiple variables simultaneously.

Tips for A/B Testing with R

Which layout of an advertisement leads to more clicks? Would a different color or position of the purchase button lead to a higher conversion rate? Does a special offer really attract more customers – and which of two phrasings would be better? For a long time, people have trusted their gut feeling to answer these questions. Today all these questions could be answered by conducting an A/B test. For this purpose, visitors of a website are randomly assigned to one of two groups between which the target metric (i.e. click-through rate, conversion rate…) can then be compared. Due to this randomization, the groups do not systematically differ in all other relevant dimensions. This means: If your target metric takes a significantly higher value in one group, you can be quite sure that it is because of your treatment and not because of any other variable. In comparison to other methods, conducting an A/B test does not require extensive statistical knowledge. Nevertheless, some caveats have to be taken into account. When making a statistical decision, there are two possible errors (see also table 1): A Type I error means that we observe a significant result although there is no real difference between our groups. A Type II error means that we do not observe a significant result although there is in fact a difference. The Type I error can be controlled and set to a fixed number in advance, e.g., at 5%, often denoted as α or the significance level. The Type II error in contrast cannot be controlled directly. It decreases with the sample size and the magnitude of the actual effect. When, for example, one of the designs performs way better than the other one, it’s more likely that the difference is actually detected by the test in comparison to a situation where there is only a small difference with respect to the target metric. Therefore, the required sample size can be computed in advance, given α and the minimum effect size you want to be able to detect (statistical power analysis). Knowing the average traffic on the website you can get a rough idea of the time you have to wait for the test to complete. Setting the rule for the end of the test in advance is often called “fixed-horizon testing”. Table 1: Overview over possible errors and correct decisions in statistical tests Effect really exists No Yes Statistical test is significant No True negative Type II error (false negative) Yes Type I error (false positive) True positive Statistical tests generally provide the p-value which reflects the probability of obtaining the observed result (or an even more extreme one) just by chance, given that there is no effect. If the p-value is smaller than α, the result is denoted as “significant”. When running an A/B test you may not always want to wait until the end but take a look from time to time to see how the test performs. What if you suddenly observe that your p-value has already fallen below your significance level – doesn’t that mean that the winner has already been identified and you could stop the test? Although this conclusion is very appealing, it can also be very wrong. The p-value fluctuates strongly during the experiment and even if the p-value at the end of the fixed-horizon is substantially larger than α, it can go below α at some point during the experiment. This is the reason why looking at your p-value several times is a little bit like cheating, because it makes your actual probability of a Type I error substantially larger than the α you chose in advance. This is called “α inflation”. At best you only change the color or position of a button although it does not have any impact. At worst, your company provides a special offer which causes costs but actually no gain. The more often you check your p-value during the data collection, the more likely you are to draw wrong conclusions. In short: As attractive as it may seem, don’t stop your A/B test early just because you are observing a significant result. In fact you can prove that if you increase your time horizon to infinity, you are guaranteed to get a significant p-value at some point in time. The following code simulates some data and plots the course of the p-value during the test. (For the first samples which are still very small R returns a warning that the chi square approximation may be incorrect.) library(timeDate) library(ggplot2) # Choose parameters: pA

5 Tricks When A/B Testing Is Off The Table

Data Science, Machine Learning, AI & Analytics