prob-stats

The false positive paradox explains why you misjudge risk

17 Mar 2026

scientificamerican.com

Here’s how a mathematical paradox distorts our view of news, safety and statistics

Understanding Standard Deviation vs Standard Error

22 Jan 2026

statology.org

Understand the key differences between standard deviation and standard error with clear examples and practical applications.

Why 52 Cards Is the Perfect Number for Poker—Mathematically

13 Jan 2026

scientificamerican.com

A traditional card deck happens to dodge a tricky poker paradox. Other poker variants aren’t so lucky

Implementing Softmax From Scratch: Avoiding the Numerical Stability Trap - MarkTechPost

7 Jan 2026

marktechpost.com

Learn about implementing Softmax from scratch and discover how to avoid the numerical stability trap in deep learning projects.

Why Mean-Variance Fails: Alternative Portfolio Risk Metrics with Python

14 Nov 2025

statology.org

5 Effect Size Measures You’ll Actually Use

27 Oct 2025

statology.org

Learn to calculate and interpret five essential effect size measures with Python examples and clear guidance.

Explaining Gibrat's Law: Why Growth Creates Lognormal Distributions

9 Oct 2025

statology.org

Gibrat's Law explains why proportional growth processes create lognormal distributions across economics, biology, and social systems.

Softmax in Statistics: Turning Scores into Probabilities

24 Sep 2025

statology.org

Learn how softmax converts raw scores into probability distributions that power AI decision-making.

How to Calculate and Interpret the Covariance Matrix with NumPy

17 Sep 2025

statology.org

In this article, we will explore how to calculate and interpret the covariance matrix using NumPy.

p-values Explained in Plain English (with Visuals)

4 Sep 2025

statology.org

In this article, we'll explore what p-values really mean, what they do not mean, and how to interpret them correctly.

Understanding Occam's Razor: Why Simpler Models Usually Win

26 Aug 2025

statology.org

Learn why simpler statistical models often outperform complex alternatives and when to apply this principle.

5 Real-Life Examples of the Poisson Distribution

7 Jul 2025

statology.org

Image by Editor | ChatGPT The Poisson distribution might sound like one of those things you only deal with in a stats class, but it actually shows up in

How to Visualize Skewness and Kurtosis in Python

12 Jun 2025

statology.org

In this article, you will learn how to visualize skewness and kurtosis using Python.

10 Python One-Liners to Run Common Statistical Tests

20 May 2025

statology.org

In this article, we'll explore 10 Python one-liners that showcase the progression from basic statistical tests to sophisticated analyses.

7 Statistical Concepts Machine Learning Engineers Misunderstand

8 May 2025

statology.org

Let’s break down seven statistical concepts that even seasoned machine learning engineers often trip over — and why getting them right matters more than you think.

Concise Guide to Survival Analysis

18 Apr 2025

statology.org

So, if you’ve ever asked, “How long until X happens?” and wanted to back that up with solid data, you’re in the right place.

Introduction to statsmodels

14 Apr 2025

statology.org

This article explains its features, installation, and how to use it with examples.

The Concise Guide to Chi-Square Distribution

8 Apr 2025

statology.org

Let's clarify this important statistical pattern and understand its significance in analysis.

The Concise Guide to Poisson Distribution

1 Apr 2025

statology.org

The Poisson distribution is a discrete probability distribution that expresses the likelihood of a specific number of events occurring within a fixed time or space interval.

A Complete Guide to Understanding Probability Distributions

25 Mar 2025

statology.org

This article is your ultimate guide to understanding them gently and illustratively.

Statistical Formulas For Programmers – Evan Miller

13 Mar 2025

evanmiller.org

What is Hellinger distance? - Dataconomy

12 Mar 2025

dataconomy.com

Hellinger Distance is a statistical measure that quantifies the similarity between two probability distributions, useful in data analysis and machine learning applications.

Learning curve: The Kaplan-Meier estimator | The Actuary

9 Mar 2025

theactuary.com

The Kaplan-Meier survival curve estimator is one of the most cited ideas in science

The Concise Guide to Leverage

3 Mar 2025

statology.org

Leverage helps us identify observations that could significantly influence our regression results, even in ways that aren't immediately obvious.

The Concise Guide to Heteroscedasticity

17 Feb 2025

statology.org

Heteroscedasticity might seem like just the opposite of homoscedasticity, but understanding it in its own right is crucial for any data analyst.

The Concise Guide to Homoscedasticity

13 Feb 2025

statology.org

Homoscedasticity stands as one of those statistical terms that can seem unnecessarily complex at first glance.

Time Series Decomposition: Extracting Seasonal, Trend, and Residual Components

7 Feb 2025

statology.org

Still, the exact application, challenges and shortcuts related to this technique are relatively unknown, and that’s what this article seeks to change.

The Concise Guide to Statistical Power

6 Feb 2025

statology.org

Statistical power might be the most frequently misunderstood concept in research design. While many researchers know they "need" it, few truly understand

Understanding Degrees of Freedom in Statistics

9 Jan 2025

statology.org

Degrees of freedom (df) represent the number of independent values in a dataset that are free to vary while still satisfying the statistical constraints imposed on the data.

Statology's Most Popular Articles of 2024

31 Dec 2024

statology.org

Have a look at Statology's most popular articles of the year!

Beyond the Mask: A Comprehensive Study of Discrete Diffusion Models

15 Dec 2024

marktechpost.com

Masked diffusion has emerged as a promising alternative to autoregressive models for the generative modeling of discrete data. Despite its potential, existing research has been constrained by overly complex model formulations and ambiguous relationships between different theoretical perspectives. These limitations have resulted in suboptimal parameterization and training objectives, often requiring ad hoc adjustments to address inherent challenges. Diffusion models have rapidly evolved since their inception, becoming a dominant approach for generative media and achieving state-of-the-art performance across various domains. Significant breakthroughs have been particularly notable in image synthesis, audio generation, and video production, demonstrating the transformative potential of this innovative

Square roots and maxima

1 Dec 2024

leancrew.com

A brief numerical and graphical check on a 3Blue1Brown video.

Distribution Calculator Guide: From Basic Probabilities to Statistical Tests

30 Nov 2024

statology.org

The calculators in this guide follow a natural progression, starting with basic probabilities and z-scores, moving through hypothesis testing tools, and concluding with specialized distributions.

Extreme Value Theory: Understanding and Predicting Rare Events

12 Nov 2024

statology.org

Let's have a closer look at EVT, its applications, and its challenges.

How to Use the Cauchy Distribution in Python

11 Nov 2024

statology.org

In this tutorial, we’ll learn more about the Cauchy distribution, visualize its probability density function, and learn how to use it in Python.

Welch’s t-Test: The Reliable Way to Compare 2 Population Means with Unequal

17 Jun 2024

towardsdatascience.com

Discover why Welch’s t-Test is the go-to method for accurate statistical comparison, even when variances differ.

Computing Minimum Sample Size for A/B Tests in Statsmodels: How and Why

31 May 2024

towardsdatascience.com

A deep-dive into how and why Statsmodels uses numerical optimization instead of closed-form formulas

How the Guinness Brewery Invented the Most Important Statistical Method in

27 May 2024

scientificamerican.com

The most common test of statistical significance originated from the Guinness brewery. Here’s how it works

Mastering Statistical Tests

21 May 2024

towardsdatascience.com

Your Guide to Choosing the Right Test for Your Data

9 key probability distributions in data science: Easy explanation

29 Dec 2023

datasciencedojo.com

Understanding probability distributions in data science is crucial. They provide a mathematical framework for modeling and analyzing data.

Beyond the Bell Curve: An Introduction to the t-distribution

4 Sep 2023

towardsdatascience.com

Discover the origins, theory and uses behind the famous t-distribution

Reliability Analysis with Python

6 Aug 2023

towardsdatascience.com

Total Productive Maintenance

Mastering Monte Carlo: How To Simulate Your Way to Better Machine Learning

3 Aug 2023

towardsdatascience.com

How a Scientist Playing Solitaire Forever Changed the Game of Statistics

SynthDiD 101: A Beginner’s Guide to Synthetic Difference-in-Differences

26 Apr 2023

towardsdatascience.com

On the method’s advantages and disadvantages, demonstrated with the synthdid package in R

How to Understand and Use the Jensen-Shannon Divergence

2 Mar 2023

towardsdatascience.com

A primer on the math, logic, and pragmatic application of JS Divergence — including how it is best used in drift monitoring

The Unreasonable Effectiveness of Conditional Probabilities

24 Feb 2023

two-wrongs.com

There are more ways to arrange a deck of cards than there are atoms

2 Feb 2023

mcgill.ca

Think of your last card game – euchre, poker, Go Fish, whatever it was. Would you believe every time you gave the whole deck a proper shuffle, you were holding a sequence of cards which had never

How to Select the Right Statistical Tests for Different A/B Metrics

10 Dec 2022

towardsdatascience.com

A Discussion of the go-to methods for 5 Types of A/B Metrics

Chi-Square Test to Compare Categorical Variables

1 Oct 2022

towardsdatascience.com

Complete Guideline to Find Dependencies among Categorical Variables with Chi-Square Test

The statistical magic behind the bootstrap

5 Sep 2022

towardsdatascience.com

How to use the bootstrap for tests or confidence intervals and why it works

Fully Mastering Fisher’s Exact Test for A/B Testing

30 Jul 2022

towardsdatascience.com

While Fisher’s exact test is a convenient tool for A/B testing, the idea and results of the test are often hard to grasp and difficult to…

Heuristics That Almost Always Work

11 Jul 2022

astralcodexten.substack.com

...

Understanding CUPED

4 Jul 2022

towardsdatascience.com

An in-depth guide to the state-of-the-art variance reduction technique for A/B tests

Home Page of Evan Miller

28 Jun 2022

evanmiller.org

Articles, software, calculators, and opinions.

Implications of use of multiple controls in an A/B test

28 Jun 2022

blog.twitter.com

DAGs and Control Variables

21 Jun 2022

link.medium.com

How to select control variables for causal inference using Directed Acyclic Graphs

Sobol Indices to Measure Feature Importance

21 Jun 2022

towardsdatascience.com

Understanding the model’s output plays a major role in business-driven projects, and Sobol can help

Confidence Intervals Simply Explained

17 Mar 2022

towardsdatascience.com

A concise explanation of confidence intervals.

Statistical T-Test Simply Explained

17 Mar 2022

towardsdatascience.com

An introduction to the Student’s t-distribution and the Student’s t-test

Log-normal Distribution — A simple explanation

19 Feb 2022

towardsdatascience.com

How to calculate μ & σ, the mode, mean, median & variance

30 Probability and Statistics Interview Questions for Data Scientists

9 Dec 2021

towardsdatascience.com

Top 30 Probability and Statistics Interview Questions that can help you sharpen your skills to ace your data science interview

A Simple Interpretation of p-values

29 Nov 2021

towardsdatascience.com

P-values & ice cream consumption simply explained.

Probability Distributions with Python’s SciPy

23 Oct 2021

towardsdatascience.com

How to Model random Processes with Distributions and Fit them to Observational Data

The curious case of Simpson’s Paradox

5 May 2021

towardsdatascience.com

In 1996, Appleton, French, and Vanderpump conducted an experiment to study the effect of smoking on a sample of people. The study was conducted over twenty years and included 1314 English women…

Resource Round-Up: Causal Inference | Emily Riederer

14 Mar 2021

emilyriederer.netlify.app

Free books, lectures, blogs, papers, and more for a causal inference crash course

Box-Cox transformation is the magic we need

10 Mar 2021

towardsdatascience.com

The smart trick to choose the right model

8 Common Pitfalls of Running A/B Tests

22 Feb 2021

towardsdatascience.com

How not to fail your online controlled experimentation

Using strip charts to visualize dozens of time series at once

19 Jan 2021

towardsdatascience.com

Strip charts are extremely useful to make heads or tails from dozens (and up to several hundred) of time series over very long periods of…

Project Lighthouse — Part 1: P-sensitive k-anonymity

18 Dec 2020

medium.com

Part one of a series on how we will measure discrepancies in Airbnb guest acceptance rates using anonymized perceived demographic data.

Log-Normal Distribution

18 Dec 2020

t.co

A Log-Normal Distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.

Understanding Parametric Tests, Skewness, and Kurtosis

10 Dec 2020

allaboutcircuits.com

This article introduces important subcategories of inferential statistical tests and discusses descriptive statistical measures related to the normal distribution.

7 Statistical Distributions that every Data Scientist should know— with intuitive explanations

3 Nov 2020

towardsdatascience.com

Intuitive explanations for the Normal, Bernoulli, Binomial, Poisson, Exponential, Gamma and Weibull distribution — with Python example code

10 Normality Tests-Python (Step-By-Step Guide 2020)

3 Nov 2020

towardsdatascience.com

Normality tests to check if a variable or sample has a normal distribution.

How to Calculate the KL Divergence for Machine Learning - MachineLearningMastery.com

3 Nov 2020

machinelearningmastery.com

It is often desirable to quantify the difference between probability distributions for a given random variable. This occurs frequently in machine learning, when we may be interested in calculating the difference between an actual and observed probability distribution. This can be achieved using techniques from information theory, such as the Kullback-Leibler Divergence (KL divergence), or relative entropy, and the Jensen-Shannon…

Seven Must-Know Statistical Distributions and Their Simulations for Data Sc

16 Oct 2020

towardsdatascience.com

Assumptions, relationships, simulations, and so on

Overview of data distributions

24 Jun 2020

kdnuggets.com

With so many types of data distributions to consider in data science, how do you choose the right one to model your data? This guide will overview the most important distributions you should be familiar with in your work.

A quick refresher of statistical power

24 Jun 2020

towardsdatascience.com

An easy-to-follow guide next time you forget how to do power calculations

NIST Handbook of Engineering Statistics

1 Jun 2020

itl.nist.gov

Lognormal, Weibull, and Gamma distribution in One Picture - DataScienceCentral.com

1 Jun 2020

datasciencecentral.com

At first glance, the Lognormal, Weibull, and Gamma distributions distributions look quite similar to each other. Selecting between the three models is “quite difficult” (Siswadi & Quesenberry) and the problem of testing which distribution is the best fit for data has been studied by a multitude of researchers. If all the models fit the data fairly well,… Read More »Lognormal, Weibull, and Gamma distribution in One Picture

The Student t-Distribution

1 Jun 2020

towardsdatascience.com

The Power-Law Distribution

1 Jun 2020

towardsdatascience.com

Classic Probability Problem #2: The Coupon Problem

1 Jun 2020

towardsdatascience.com

Data Science & Machine Learning Interviews

A Complete Guide to Hypothesis Testing

17 May 2020

towardsdatascience.com

From Controlling for Testing Errors to Selecting the Right Test

Understanding Probability And Statistics: Chi-Squared, Student-T, And F Dis

21 Apr 2020

towardsdatascience.com

A Must Know Topic For Data Scientists Who Work With Data And Statistical Inference

Lies, Damned Lies, and Averages: Perc50, Perc95 explained for Programmers

19 Mar 2020

schneems.com

I got a customer ticket the other day that said they weren’t worried about response time because “New Relic is showing our average response time to be sub 20...

Hypothesis Testing Explained as Simply as Possible

9 Mar 2020

towardsdatascience.com

One of the most important concepts for Data Scientists

Beta Distribution — Intuition, Examples, and Derivation

19 Feb 2020

t.co

When to use Beta distribution

https://www.analyticbridge.datasciencecentral.com/profiles/blogs/three-classes-of-metrics-centrality-volatility-and-bumpiness?fbclid=IwAR0iMt0Dcpbzbv_Fn9uVS4A0NQFB0lWxqOXzhwIdPMw9fOqzEAc8hVThaDc

19 Feb 2020

analyticbridge.datasciencecentral.com

The seven deadly sins of statistical misinterpretation, and how to avoid them - DataScienceCentral.com

19 Feb 2020

datasciencecentral.com

By Winnifred Louis, Associate Professor, Social Psychology, The University of Queensland, and Cassandra Chapman,PhD Candidate in Social Psychology, The University of Queensland. Here are the 7 sins: Assuming small differences are meaningful Equating statistical significance with real-world significance Neglecting to look at extremes Trusting coincidence Getting causation backwards Forgetting to consider outside causes Deceptive graphs To read… Read More »The seven deadly sins of statistical misinterpretation, and how to avoid them

The Little Handbook of Statistical Practice

23 Dec 2019

jerrydallal.com

Statistical Thinking for Industrial Problem Solving – a free online course

23 Dec 2019

kdnuggets.com

Data Science, Machine Learning, AI & Analytics

Markov Chain Analysis and Simulation using Python - Towards Data Science

14 Dec 2019

towardsdatascience.com

Solving real-world problems with probabilities

P-value Explained Simply for Data Scientists

14 Dec 2019

mlwhiz.com

This post is about various evaluation metrics and how and when to use them.

https://www.analyticbridge.datasciencecentral.com/group/books/forum/topics/handbook-of-fitting?fbclid=IwAR0TqLZf6SO46CI9bfg1crEEl79gzMDyO08aAIqaKryOxlIgj3nvvbjjREQ

7 Oct 2019

analyticbridge.datasciencecentral.com

29 Statistical Concepts Explained in Simple English - Part 3

30 Aug 2019

datasciencecentral.com

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more. To keep receiving these articles, sign up on DSC. The full series is accessible here. 29 Statistical Concepts… Read More »29 Statistical Concepts Explained in Simple English – Part 3

15 Statistical Hypothesis Tests in Python (Cheat Sheet)

12 Feb 2019

machinelearningmastery.com

Quick-reference guide to the 17 statistical hypothesis tests that you need in applied machine learning, with sample code in Python. Although there are hundreds of statistical hypothesis tests that you could use, there is only a small subset that you may need to use in a machine learning project. In this post, you will discover a cheat sheet for the…

Explaining the 68-95-99.7 rule for a Normal Distribution

25 Jul 2018

kdnuggets.com

This post explains how those numbers were derived in the hope that they can be more interpretable for your future endeavors.

Frequency Distribution Analysis using Python Data Stack – Part 1

8 Jun 2018

dataconomy.com

During my years as a Consultant Data Scientist I have received many requests from my clients to provide frequency distribution

Skewness vs Kurtosis – The Robust Duo

5 May 2018

kdnuggets.com

Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of…

Counting Bloom Filter in C – Tony Allen – Medium

27 Dec 2017

medium.com

I recently ran across this bloom filter post by Michael Schmatz and it inspired me to write about a neat variation on the bloom filter that…

Removing Outliers Using Standard Deviation in Python

27 Dec 2017

kdnuggets.com

Standard Deviation is one of the most underrated statistical tools out there. It’s an extremely useful metric that most people know how to calculate but very few know how to use effectively.

Your Guide to Master Hypothesis Testing in Statistics - DataScienceCentral.com

27 Dec 2017

datasciencecentral.com

This article was written by Sunil Ray. Sunil is a Business Analytics and Intelligence professional with deep experience. Introduction – the difference in mindset I started my career as a MIS professional and then made my way into Business Intelligence (BI) followed by Business Analytics, Statistical modeling and more recently machine learning. Each of these transition has required… Read More »Your Guide to Master Hypothesis Testing in Statistics

The 10 Statistical Techniques Data Scientists Need to Master

20 Nov 2017

kdnuggets.com

The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.

Fisher's method - Wikipedia

4 Oct 2017

en.m.wikipedia.org

In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combine the results from several independence tests bearing upon the same overall hypothesis (H0).

Relative error distributions, without the heavy tail theatrics

3 Dec 2016

win-vector.com

Nina Zumel prepared an excellent article on the consequences of working with relative error distributed quantities (such as wealth, income, sales, and many more) called “Living in A Lognormal…

prob-stats — my Raindrop.io articles