a-b

cover image

A/B testing is a cornerstone of data science, essential for making informed business decisions and optimizing customer revenue. Here, we delve into six widely used statistical methods in A/B testing, explaining their purposes and appropriate contexts. 1. Z-Test (Standard Score Test): When to Use: This method is ideal for large sample sizes (typically over 30) when the population variance is known. Purpose: Compares the means of two groups to determine if they are statistically different. Applications: This technique is frequently employed in conversion rate optimization and click-through rate analysis. It helps identify whether changes in website elements or marketing strategies

cover image

Pic by author - using DALL-E 3 When running experiments don’t forget to bring your survival kit I’ve already made the case in several blog posts (part 1, part 2, part 3) that using survival analysis can improve churn prediction. In this blog post I’ll ...

cover image

Visual, interactive sample size calculator ideal for planning online experiments and A/B tests.

cover image

A deep-dive into how and why Statsmodels uses numerical optimization instead of closed-form formulas

cover image

Evaluating ad targeting product using causal inference: propensity score matching!

cover image

For non-tech industry folks, an “A/B test” is just a randomized controlled trial where you split users or other things into treatment and control groups, and then later compare key metr…

cover image

Running experiments is equal parts powerful and terrifying. Powerful because you can validate changes that will transform your product for the better…

cover image

An in-depth explanation of “Thompson Sampling”, a more efficient alternative to A/B testing for online learning

cover image

Questions on A/B testing are being increasingly asked in interviews but reliable resources to prepare for these are still far and few…

cover image

Using and choosing priors in randomized experiments.

cover image

A Discussion of the go-to methods for 5 Types of A/B Metrics

cover image

Cookies and privacy, interleaving experiments, clean dial-ups, and test metrics

cover image

How today’s tech companies make data-driven decisions in Machine Learning production

cover image

While Fisher’s exact test is a convenient tool for A/B testing, the idea and results of the test are often hard to grasp and difficult to…

cover image

A/B testing is hitting the mainstream because it is so effective. And with so many tools available it has become very easy and very inexpensive to run. Here are 23 helpful tips on how you can take your A/B tests from basic to the next level.

cover image

A/B tests provide more than statistical validation of one execution over another. They can and should impact how your team prioritizes projects.

We’re Agile, we think lean, we’re data-driven. If you live in the new economy and work in some sort of digital product you hear some of…

cover image

An applied introduction to causal inference in tech

cover image

We spoke with Etsy’s iOS Software Engineer, Lacy Rhoades, about their culture of continuous experimentation. Learn about their a/b testing culture

cover image

An in-depth guide to the state-of-the-art variance reduction technique for A/B tests

cover image

If you want to always be closing, then you need to always be testing, a long-standing mantra (and title of a popular book) in the search marketing space.

cover image

A/B testing is a very popular technique of checking granular changes in a product without mistakenly taking into account changes that were…

cover image

Multivariate tests indicate how various UI elements interact with each other and are a tool for making incremental improvements to a design.

cover image

Data will tell you the right answer. If you can’t find data somewhere, you should run a test, collect the data, and let it tell you what’s right. A/B testing is one of the core marketing arts a marketer should master and practice.

The best way to determine what works best for your site is to carry out an A/B test for your landing pages. Check out this A/B significant test calculator.

cover image

Our A/B test calculator will help you to compare two or three variants to determine which test will be statistically significant.

cover image

Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford.

cover image

A/B testing, the process of exposing randomized visitors to one or more variables, is among the most effective strategies to optimize user experiences and conversion rates. Here is a list of A/B testing tools.

cover image

Statistics & Business can share the same Language

cover image

Experimentation is widely used at tech startups to make decisions on whether to roll out new product features, UI design changes, marketing campaigns and more, usually with the goal of improving…

cover image

The best way to optimise your website is usually the simplest.

cover image

How not to fail your online controlled experimentation

cover image

Optimizing web marketing strategies through statistical testing

cover image

A/B Testing — A complete guide to statistical testing - bjpcjp/AB_Testing

cover image

The intuitive way of A/B testing. The advantages of the Bayesian approach and how to do it.

cover image

A/B testing is a very popular technique for checking granular changes in a product without mistakenly taking into account changes that…

cover image

This is a story of how a software company was able to start a conversation with 8x more of their users by cutting the length of their emails by 90%. You could set up a test of this method in less than an hour. The Problem One of the most

cover image

Part 2: Proportion Metrics

cover image

Making the most out of your experiments and observational data

cover image

Big success. Bigger failure. And lots of lessons. Learn why building a growth team may be a multi-million dollar mistake.

cover image

197K subscribers in the datasets community. A place to share, find, and discuss Datasets.

cover image

Etsy relies heavily on experimentation to improve our decision-making process. We leverage our internal A/B testing tool when...

cover image

The biggest question in ecommerce A/B testing is not “how.”

cover image

A/B tests are controlled experiments of two attributes, to measure which one was most popular with users. You can apply A/B testing to just about anything that you can measure. Multivariate testing allows you to measure multiple variables simultaneously.

cover image

Which layout of an advertisement leads to more clicks? Would a different color or position of the purchase button lead to a higher conversion rate? Does a special offer really attract more customers – and which of two phrasings would be better? For a long time, people have trusted their gut feeling to answer these questions. Today all these questions could be answered by conducting an A/B test. For this purpose, visitors of a website are randomly assigned to one of two groups between which the target metric (i.e. click-through rate, conversion rate…) can then be compared. Due to this randomization, the groups do not systematically differ in all other relevant dimensions. This means: If your target metric takes a significantly higher value in one group, you can be quite sure that it is because of your treatment and not because of any other variable. In comparison to other methods, conducting an A/B test does not require extensive statistical knowledge. Nevertheless, some caveats have to be taken into account. When making a statistical decision, there are two possible errors (see also table 1): A Type I error means that we observe a significant result although there is no real difference between our groups. A Type II error means that we do not observe a significant result although there is in fact a difference. The Type I error can be controlled and set to a fixed number in advance, e.g., at 5%, often denoted as α or the significance level. The Type II error in contrast cannot be controlled directly. It decreases with the sample size and the magnitude of the actual effect. When, for example, one of the designs performs way better than the other one, it’s more likely that the difference is actually detected by the test in comparison to a situation where there is only a small difference with respect to the target metric. Therefore, the required sample size can be computed in advance, given α and the minimum effect size you want to be able to detect (statistical power analysis). Knowing the average traffic on the website you can get a rough idea of the time you have to wait for the test to complete. Setting the rule for the end of the test in advance is often called “fixed-horizon testing”. Table 1: Overview over possible errors and correct decisions in statistical tests Effect really exists No Yes Statistical test is significant No True negative Type II error (false negative) Yes Type I error (false positive) True positive Statistical tests generally provide the p-value which reflects the probability of obtaining the observed result (or an even more extreme one) just by chance, given that there is no effect. If the p-value is smaller than α, the result is denoted as “significant”. When running an A/B test you may not always want to wait until the end but take a look from time to time to see how the test performs. What if you suddenly observe that your p-value has already fallen below your significance level – doesn’t that mean that the winner has already been identified and you could stop the test? Although this conclusion is very appealing, it can also be very wrong. The p-value fluctuates strongly during the experiment and even if the p-value at the end of the fixed-horizon is substantially larger than α, it can go below α at some point during the experiment. This is the reason why looking at your p-value several times is a little bit like cheating, because it makes your actual probability of a Type I error substantially larger than the α you chose in advance. This is called “α inflation”. At best you only change the color or position of a button although it does not have any impact. At worst, your company provides a special offer which causes costs but actually no gain. The more often you check your p-value during the data collection, the more likely you are to draw wrong conclusions. In short: As attractive as it may seem, don’t stop your A/B test early just because you are observing a significant result. In fact you can prove that if you increase your time horizon to infinity, you are guaranteed to get a significant p-value at some point in time. The following code simulates some data and plots the course of the p-value during the test. (For the first samples which are still very small R returns a warning that the chi square approximation may be incorrect.) library(timeDate) library(ggplot2) # Choose parameters: pA

Data Science, Machine Learning, AI & Analytics