cover image

Here’s how a mathematical paradox distorts our view of news, safety and statistics

cover image

Understand the key differences between standard deviation and standard error with clear examples and practical applications.

cover image

A traditional card deck happens to dodge a tricky poker paradox. Other poker variants aren’t so lucky

cover image

Learn about implementing Softmax from scratch and discover how to avoid the numerical stability trap in deep learning projects.

cover image

Learn to calculate and interpret five essential effect size measures with Python examples and clear guidance.

cover image

Gibrat's Law explains why proportional growth processes create lognormal distributions across economics, biology, and social systems.

cover image

Learn how softmax converts raw scores into probability distributions that power AI decision-making.

cover image

In this article, we will explore how to calculate and interpret the covariance matrix using NumPy.

cover image

In this article, we'll explore what p-values really mean, what they do not mean, and how to interpret them correctly.

cover image

Learn why simpler statistical models often outperform complex alternatives and when to apply this principle.

cover image

Image by Editor | ChatGPT The Poisson distribution might sound like one of those things you only deal with in a stats class, but it actually shows up in

cover image

In this article, you will learn how to visualize skewness and kurtosis using Python.

cover image

In this article, we'll explore 10 Python one-liners that showcase the progression from basic statistical tests to sophisticated analyses.

cover image

Let’s break down seven statistical concepts that even seasoned machine learning engineers often trip over — and why getting them right matters more than you think.

cover image
Concise Guide to Survival Analysis
18 Apr 2025
statology.org

So, if you’ve ever asked, “How long until X happens?” and wanted to back that up with solid data, you’re in the right place.

cover image
Introduction to statsmodels
14 Apr 2025
statology.org

This article explains its features, installation, and how to use it with examples.

cover image

Let's clarify this important statistical pattern and understand its significance in analysis.

cover image

The Poisson distribution is a discrete probability distribution that expresses the likelihood of a specific number of events occurring within a fixed time or space interval.

cover image

This article is your ultimate guide to understanding them gently and illustratively.

cover image
What is Hellinger distance? - Dataconomy
12 Mar 2025
dataconomy.com

Hellinger Distance is a statistical measure that quantifies the similarity between two probability distributions, useful in data analysis and machine learning applications.

cover image

The Kaplan-Meier survival curve estimator is one of the most cited ideas in science

cover image
The Concise Guide to Leverage
3 Mar 2025
statology.org

Leverage helps us identify observations that could significantly influence our regression results, even in ways that aren't immediately obvious.

cover image
The Concise Guide to Heteroscedasticity
17 Feb 2025
statology.org

Heteroscedasticity might seem like just the opposite of homoscedasticity, but understanding it in its own right is crucial for any data analyst.

cover image
The Concise Guide to Homoscedasticity
13 Feb 2025
statology.org

Homoscedasticity stands as one of those statistical terms that can seem unnecessarily complex at first glance.

cover image

Still, the exact application, challenges and shortcuts related to this technique are relatively unknown, and that’s what this article seeks to change.

cover image
The Concise Guide to Statistical Power
6 Feb 2025
statology.org

Statistical power might be the most frequently misunderstood concept in research design. While many researchers know they "need" it, few truly understand

cover image

Degrees of freedom (df) represent the number of independent values in a dataset that are free to vary while still satisfying the statistical constraints imposed on the data.

cover image
Statology's Most Popular Articles of 2024
31 Dec 2024
statology.org

Have a look at Statology's most popular articles of the year!

cover image

Masked diffusion has emerged as a promising alternative to autoregressive models for the generative modeling of discrete data. Despite its potential, existing research has been constrained by overly complex model formulations and ambiguous relationships between different theoretical perspectives. These limitations have resulted in suboptimal parameterization and training objectives, often requiring ad hoc adjustments to address inherent challenges. Diffusion models have rapidly evolved since their inception, becoming a dominant approach for generative media and achieving state-of-the-art performance across various domains. Significant breakthroughs have been particularly notable in image synthesis, audio generation, and video production, demonstrating the transformative potential of this innovative

cover image
Square roots and maxima
1 Dec 2024
leancrew.com

A brief numerical and graphical check on a 3Blue1Brown video.

cover image

The calculators in this guide follow a natural progression, starting with basic probabilities and z-scores, moving through hypothesis testing tools, and concluding with specialized distributions.

cover image

Let's have a closer look at EVT, its applications, and its challenges.

cover image

In this tutorial, we’ll learn more about the Cauchy distribution, visualize its probability density function, and learn how to use it in Python.

cover image

Discover why Welch’s t-Test is the go-to method for accurate statistical comparison, even when variances differ.

cover image

A deep-dive into how and why Statsmodels uses numerical optimization instead of closed-form formulas

cover image

The most common test of statistical significance originated from the Guinness brewery. Here’s how it works

cover image
Mastering Statistical Tests
21 May 2024
towardsdatascience.com

Your Guide to Choosing the Right Test for Your Data

cover image

Understanding probability distributions in data science is crucial. They provide a mathematical framework for modeling and analyzing data.

cover image

Discover the origins, theory and uses behind the famous t-distribution

cover image
Reliability Analysis with Python
6 Aug 2023
towardsdatascience.com

Total Productive Maintenance

cover image

How a Scientist Playing Solitaire Forever Changed the Game of Statistics

cover image

On the method’s advantages and disadvantages, demonstrated with the synthdid package in R

cover image

A primer on the math, logic, and pragmatic application of JS Divergence — including how it is best used in drift monitoring

cover image

Think of your last card game – euchre, poker, Go Fish, whatever it was. Would you believe every time you gave the whole deck a proper shuffle, you were holding a sequence of cards which had never

cover image

A Discussion of the go-to methods for 5 Types of A/B Metrics

cover image
Chi-Square Test to Compare Categorical Variables
1 Oct 2022
towardsdatascience.com

Complete Guideline to Find Dependencies among Categorical Variables with Chi-Square Test

cover image
The statistical magic behind the bootstrap
5 Sep 2022
towardsdatascience.com

How to use the bootstrap for tests or confidence intervals and why it works

cover image
Fully Mastering Fisher’s Exact Test for A/B Testing
30 Jul 2022
towardsdatascience.com

While Fisher’s exact test is a convenient tool for A/B testing, the idea and results of the test are often hard to grasp and difficult to…

cover image
Heuristics That Almost Always Work
11 Jul 2022
astralcodexten.substack.com

...

cover image
Understanding CUPED
4 Jul 2022
towardsdatascience.com

An in-depth guide to the state-of-the-art variance reduction technique for A/B tests

cover image
Home Page of Evan Miller
28 Jun 2022
evanmiller.org

Articles, software, calculators, and opinions.

cover image
DAGs and Control Variables
21 Jun 2022
link.medium.com

How to select control variables for causal inference using Directed Acyclic Graphs

cover image
Sobol Indices to Measure Feature Importance
21 Jun 2022
towardsdatascience.com

Understanding the model’s output plays a major role in business-driven projects, and Sobol can help

cover image
Confidence Intervals Simply Explained
17 Mar 2022
towardsdatascience.com

A concise explanation of confidence intervals.

cover image
Statistical T-Test Simply Explained
17 Mar 2022
towardsdatascience.com

An introduction to the Student’s t-distribution and the Student’s t-test

cover image
Log-normal Distribution — A simple explanation
19 Feb 2022
towardsdatascience.com

How to calculate μ & σ, the mode, mean, median & variance

cover image

Top 30 Probability and Statistics Interview Questions that can help you sharpen your skills to ace your data science interview

cover image
A Simple Interpretation of p-values
29 Nov 2021
towardsdatascience.com

P-values & ice cream consumption simply explained.

cover image
Probability Distributions with Python’s SciPy
23 Oct 2021
towardsdatascience.com

How to Model random Processes with Distributions and Fit them to Observational Data

cover image
The curious case of Simpson’s Paradox
5 May 2021
towardsdatascience.com

In 1996, Appleton, French, and Vanderpump conducted an experiment to study the effect of smoking on a sample of people. The study was conducted over twenty years and included 1314 English women…

cover image
Resource Round-Up: Causal Inference | Emily Riederer
14 Mar 2021
emilyriederer.netlify.app

Free books, lectures, blogs, papers, and more for a causal inference crash course

cover image
Box-Cox transformation is the magic we need
10 Mar 2021
towardsdatascience.com

The smart trick to choose the right model

cover image
8 Common Pitfalls of Running A/B Tests
22 Feb 2021
towardsdatascience.com

How not to fail your online controlled experimentation

cover image

Strip charts are extremely useful to make heads or tails from dozens (and up to several hundred) of time series over very long periods of…

cover image

Part one of a series on how we will measure discrepancies in Airbnb guest acceptance rates using anonymized perceived demographic data.

cover image
Log-Normal Distribution
18 Dec 2020
t.co

A Log-Normal Distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.

cover image

This article introduces important subcategories of inferential statistical tests and discusses descriptive statistical measures related to the normal distribution.

cover image

Intuitive explanations for the Normal, Bernoulli, Binomial, Poisson, Exponential, Gamma and Weibull distribution — with Python example code

cover image
10 Normality Tests-Python (Step-By-Step Guide 2020)
3 Nov 2020
towardsdatascience.com

Normality tests to check if a variable or sample has a normal distribution.

cover image

It is often desirable to quantify the difference between probability distributions for a given random variable. This occurs frequently in machine learning, when we may be interested in calculating the difference between an actual and observed probability distribution. This can be achieved using techniques from information theory, such as the Kullback-Leibler Divergence (KL divergence), or relative entropy, and the Jensen-Shannon…

cover image

Assumptions, relationships, simulations, and so on

Overview of data distributions
24 Jun 2020
kdnuggets.com

With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.

cover image
A quick refresher of statistical power
24 Jun 2020
towardsdatascience.com

An easy-to-follow guide next time you forget how to do power calculations

NIST Handbook of Engineering Statistics
1 Jun 2020
itl.nist.gov
cover image

At first glance, the Lognormal, Weibull, and Gamma distributions distributions look quite similar to each other. Selecting between the three models is “quite difficult” (Siswadi & Quesenberry) and the problem of testing which distribution is the best fit for data has been studied by a multitude of researchers. If all the models fit the data fairly well,… Read More »Lognormal, Weibull, and Gamma distribution in One Picture

The Student t-Distribution
1 Jun 2020
towardsdatascience.com
The Power-Law Distribution
1 Jun 2020
towardsdatascience.com
cover image
Classic Probability Problem #2: The Coupon Problem
1 Jun 2020
towardsdatascience.com

Data Science & Machine Learning Interviews

cover image
A Complete Guide to Hypothesis Testing
17 May 2020
towardsdatascience.com

From Controlling for Testing Errors to Selecting the Right Test

cover image

A Must Know Topic For Data Scientists Who Work With Data And Statistical Inference

cover image

I got a customer ticket the other day that said they weren’t worried about response time because “New Relic is showing our average response time to be sub 20...

cover image
Hypothesis Testing Explained as Simply as Possible
9 Mar 2020
towardsdatascience.com

One of the most important concepts for Data Scientists

cover image

When to use Beta distribution

cover image

By Winnifred Louis, Associate Professor, Social Psychology, The University of Queensland, and Cassandra Chapman,PhD Candidate in Social Psychology, The University of Queensland. Here are the 7 sins: Assuming small differences are meaningful Equating statistical significance with real-world significance Neglecting to look at extremes Trusting coincidence Getting causation backwards Forgetting to consider outside causes Deceptive graphs To read… Read More »The seven deadly sins of statistical misinterpretation, and how to avoid them

The Little Handbook of Statistical Practice
23 Dec 2019
jerrydallal.com

Data Science, Machine Learning, AI & Analytics

cover image

Solving real-world problems with probabilities

cover image

This post is about various evaluation metrics and how and when to use them.

cover image

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more. To keep receiving these articles, sign up on DSC. The full series is accessible here.  29 Statistical Concepts… Read More »29 Statistical Concepts Explained in Simple English – Part 3

cover image
15 Statistical Hypothesis Tests in Python (Cheat Sheet)
12 Feb 2019
machinelearningmastery.com

Quick-reference guide to the 17 statistical hypothesis tests that you need in applied machine learning, with sample code in Python. Although there are hundreds of statistical hypothesis tests that you could use, there is only a small subset that you may need to use in a machine learning project. In this post, you will discover a cheat sheet for the…

This post explains how those numbers were derived in the hope that they can be more interpretable for your future endeavors.

cover image

During my years as a Consultant Data Scientist I have received many requests from my clients to provide frequency distribution

Skewness vs Kurtosis – The Robust Duo
5 May 2018
kdnuggets.com

Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of…

I recently ran across this bloom filter post by Michael Schmatz and it inspired me to write about a neat variation on the bloom filter that…

cover image

Standard Deviation is one of the most underrated statistical tools out there. It’s an extremely useful metric that most people know how to calculate but very few know how to use effectively.

cover image

This article was written by Sunil Ray. Sunil is a Business Analytics and Intelligence professional with deep experience. Introduction – the difference in mindset I started my career as a MIS professional and then made my way into Business Intelligence (BI) followed by Business Analytics, Statistical modeling and more recently machine learning. Each of these transition has required… Read More »Your Guide to Master Hypothesis Testing in Statistics

The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.

cover image
Fisher's method - Wikipedia
4 Oct 2017
en.m.wikipedia.org

In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combine the results from several independence tests bearing upon the same overall hypothesis (H0).

cover image

Nina Zumel prepared an excellent article on the consequences of working with relative error distributed quantities (such as wealth, income, sales, and many more) called “Living in A Lognormal…