machine-learning

The Hundred-Page Machine Learning Book by Andriy Burkov

All you need to know about Machine Learning in a hundred pages. Supervised and unsupervised learning, support vector machines, neural networks, ensemble methods, gradient descent, cluster analysis and dimensionality reduction, autoencoders and transfer learning, feature engineering and hyperparameter tuning! Math, intuition, illustrations, all in just a hundred pages!

Simplest backpropagation explainer without chain rule

Neural Networks learn to predict by backpropagation. This article aims to help you, build a solid intuition about the concept using a simple example. The ideas we learn here can be expanded for bigger nerual network. I assume that you already know how feed forward neural network works. Before reading the article further, take a pen and paper. The calculation used in this article can be done in the head. But I still want you to do by hand.

“Periodic table of machine learning” could fuel AI discovery

After uncovering a unifying algorithm that links more than 20 common machine-learning approaches, MIT researchers organized them into a “periodic table of machine learning” that can help scientists combine elements of different methods to improve algorithms or create new ones.

Cross-entropy and KL divergence - Eli Bendersky's website

Machine Learning Algorithms You Never Knew Existed, But Are Quite Useful

Ever heard of Tsetlin Machines ??

Beginner’s Guide to Deploying a Machine Learning API with FastAPI

In this guide, you will learn how to deploy a machine learning model as an API using FastAPI. We will create an API that predicts the species of a penguin based on its bill length and flipper length. Prerequisites Step 1: Set Up Your Environment Step 2: Prepare Your Machine Learning Model Step 3: Create […]

What are Support Vector Machines (SVM)? - Dataconomy

Support Vector Machines (SVM) are supervised learning algorithms used for classification and regression by finding optimal decision boundaries between data classes.

How to Implement CatBoost in R

This article shows how to implement CatBoost in R.

What is triplet loss? - Dataconomy

Triplet loss is a machine learning function that minimizes distances between similar data points while maximizing distances between dissimilar ones.

Olivier Grisel - Predictive survival analysis with scikit-learn, scikit-survival and lifelines

50+ Projects to Learn Data Analysis | Aman Kharwal

In this article, I'll take you through a list of 50+ Data Analysis Projects you should try to learn Data Analysis.

10 Little-Known Python Libraries That Will Make You Feel Like a Data Wizard - KDnuggets

In this article, I will introduce you to 10 little-known Python libraries every data scientist should know.

80+ Data Science Projects | Aman Kharwal

In this article, I'll take you through a list of 80+ hands-on Data Science projects you should try to learn everything in Data Science.

A Practical Guide to Survival Analysis

Survival analysis consists of statistical methods that help us understand and predict how long it takes for an event to occur.

Support Vector Machines: A Progression of Algorithms

MMC, SVC, SVM: What’s the difference?

Multi-Head Latent Attention and Other KV Cache Tricks

How a Key-Value (KV) cache reduces Transformer inference time by trading memory for computation

50+ AI & ML Projects with Python | Aman Kharwal

In this article, I'll take you through a list of 50+ AI & ML projects solved & explained with Python that you should try.

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3,...

Don't use cosine similarity carelessly - Piotr Migdał

Cosine similarity - the duct tape of AI. Convenient but often misused. Let's find out how to use it better.

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning - Machine Learning Techniques

Dramatically Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

7 Essential Python Libraries for MLOps - KDnuggets

Popular MLOps Python tools that will make machine learning model deployment a piece of cake.

How to Run a Paper Club (also: LIVE at NeurIPS 2024!)

Your ultimate Paper Club Starter Kit, from your friends at the Latent Space Paper Club, where we have now read 100 papers. Also: Announcing Latent Space Paper Club LIVE! at Neurips 2024! Join us!

10 Types of Machine learning Algorithms and Their Use Cases

In today’s world, you’ve probably heard the term “Machine Learning” more than once. It’s a big topic, and if you’re new to it, all the technical words might feel confusing. Let’s start with the basics and make it easy to understand. Machine Learning, a subset of Artificial Intelligence, has emerged as a transformative force, empowering machines to learn from data and make intelligent decisions without explicit programming. At its core, machine learning algorithms seek to identify patterns within data, enabling computers to learn and adapt to new information. Think about how a child learns to recognize a cat. At first,

AI Alone Isn’t Ready for Chip Design

A combination of classical search and machine learning may be the way forward

Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement Learning

How Airbnb leverages machine learning and reinforcement learning techniques to solve a unique information retrieval task in order to…

Difference Between a Batch and an Epoch in a Neural Network - MachineLearningMastery.com

Stochastic gradient descent is a learning algorithm that has a number of hyperparameters. Two hyperparameters that often confuse beginners are the batch size and number of epochs. They are both integer values and seem to do the same thing. In this post, you will discover the difference between batches and epochs in stochastic gradient descent. After reading this post, you…

Marketing Mix Modeling (MMM): How to Avoid Biased Channel Estimates

Learn which variables you should and should not take into account in your model.

10 GitHub Repositories for Advanced Machine Learning Projects

Where can you find projects dealing with advanced ML topics? GitHub is a perfect source with its many repositories. I’ve selected ten to talk about in this article.

The m=√p rule for random forests | R-bloggers

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “” rule Interestingly, on that one, we can play a bit, and try all choices, and do it again, on a different train/test split, library(randomForest) library(ISLR2) set.seed(123) sim = function(t){ train = sample(nrow(Boston), size = nrow(Boston)*.7) subsim = function(i){ rf.boston

Calculating the Uncertainty Coefficient (Theil’s U) in Python

A measure of correlation between discrete (categorical) variables

The m=√p rule for random forests

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “” rule Interestingly, on that one, we can play a bit, and try all choices, and do it again, … Continue reading The m=√p rule for random forests →

A/B/C Tests: How to Analyze Results From Multi-Group Experiments

Experimentation is widely used at tech startups to make decisions on whether to roll out new product features, UI design changes, marketing campaigns and more, usually with the goal of improving…

Beyond A/B Testing: Primer on Causal Inference

Making the most out of your experiments and observational data

AI & ML Projects with Python

In this article, I'll take you through a list of guided projects to master AI & ML with Python. AI & ML Projects with Python.

Carl-McBride-Ellis/Compendium-of-free-ML-reading-resources

Compendium of free ML reading resources.

An In-Depth Guide to Contrastive Learning: Techniques Models and Applications

Discover the fundamentals of contrastive learning, including key techniques like SimCLR, MoCo, and CLIP. Learn how contrastive learning improves unsupervised learning and its practical applications.

Gradio Documentation

Documentation, tutorials and guides for the Gradio ecosystem..

Counts Outlier Detector: Interpretable Outlier Detection

An interpretable outlier detector based on multi-dimensional histograms.

Basis Functions: Simple Definition - Statistics How To

Types of Functions > Basis functions (called derived features in machine learning) are building blocks for creating more complex functions. In other

Cosine Similarity

Cosine similarity can measure the proximity between two documents by transforming words into vectors within a vector space.

7 Cool Technical GenAI & LLM Job Interview Questions

Cool LLM and GenAI tech questions covering many modern concepts, including fast vector search, contextual tokens, and augmented structures

Permutation Feature Importance from Scratch

Understanding the importance of permutations in the field of explainable AI

Tips for LLM Pretraining and Evaluating Reward Models

Discussing AI Research Papers in March 2024

SVM and Kernels: The Math that Makes Classification Magic

Imagine you're at a party separating people who love pizza (yum!) from those who...well, have...

A Benchmark and Taxonomy of Categorical Encoders

New. Comprehensive. Extendable.

Algorithm Repository

Customers Prefer to Crowdfund Products They Can Improve

Platforms like Kickstarter and Indiegogo have not only broadened access to funding to companies that might struggle in the capital markets but have also transformed the way companies connect with consumers during product development, replacing focus groups with real customers who have a stake in the final product. Despite crowdfunding’s many benefits, numerous campaigns still fail. To understand why, the authorse embarked on an empirical analysis of 18,173 campaigns for physical products in the technology and design categories on Kickstarter. They found that many companies often present initial products that are so fully developed that customers don’t feel that their input will materially change the product and are reluctant to contribute as a result.

Speech and Language Processing

CatBoost - state-of-the-art open-source gradient boosting library with cate

#CatBoost - state-of-the-art open-source gradient boosting library with categorical features support,

Master Dispersion Plots in 6 Minutes!

Learn graphical text analysis with NLTK

What Is a Schur Decomposition? – Nick Higham

A Schur decomposition of a matrix $latex A\in\mathbb{C}^{n\times n}$ is a factorization $LATEX A = QTQ^*$, where $LATEX Q$ is unitary and $LATEX T$ is upper triangular. The diagonal entries of $LAT…

Encoding Categorical Variables: A Deep Dive into Target Encoding

Data comes in different shapes and forms. One of those shapes and forms is known as categorical data.

Getting started predicting time series data with Facebook Prophet

This article aims to take away the entry barriers to get started with time series analysis in a hands-on tutorial using Prophet

The Math behind Adam Optimizer

Why is Adam the most popular optimizer in Deep Learning? Let’s understand it by diving into its math, and recreating the algorithm.

3 Key Encoding Techniques for Machine Learning: A Beginner-Friendly Guide

How should we choose between label, one-hot, and target encoding?

Understanding Latent Dirichlet Allocation (LDA) — A Data Scientist’s Guide

LDA explained with a dog pedigree model

An Overview of Contextual Bandits

A dynamic approach to treatment personalization

Pearson vs Spearman Correlation: Find Harmony between the Variables

Which measure of correlation should you use for your task? Learn all you need to know about Pearson and Spearman correlations.

The Perfect Way to Smooth Your Noisy Data

Insanely fast and reliable smoothing and interpolation with the Whittaker-Eilers method.

10 Noteworthy AI Research Papers of 2023

This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.

Boosting Algorithms in Machine Learning, Part I: AdaBoost

Understanding the logic behind AdaBoost and implementing it using Python

An unusual introduction to manifolds

This tutorial offers a bridge between the abstract mathematics of manifolds and computational practice.

Market Basket Analysis using Python

In this article, I'll take you through the task of Market Basket Analysis using Python. Market Basket Analysis using Python.

The Power of Independent Component Analysis (ICA) on Real-World Application

Independent component analysis (ICA) is a powerful data-driven tool capable of separating linear contributions in the data

Math for Machine Learning: 14 Must-Read Books

It is possible to design and deploy advanced machine learning algorithms that are essentially math-free and stats-free. People working on that are typically professional mathematicians. These algorithms are not necessarily simpler. See for instance a math-free regression technique with prediction intervals, here. Or supervised classification and alternative to t-SNE, here. Interestingly, this latter math-free machine

A Gentle Introduction to Complementary Log-Log Regression

An alternative of logistic regression in special conditions

No sacred masterpieces

Or "that time I built Excel for Uber and they ditched it like a week after launch"

XGBoost: How Deep Learning Can Replace Gradient Boosting and Decision Trees

A world without if

Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning

BookCorpus has helped train at least thirty influential language models (including Google’s BERT, OpenAI’s GPT, and Amazon’s Bort), according to HuggingFace. This is the research question that…

Pearson, Spearman and Kendall Correlation Coefficients, by Hand

Learn how to compute the Pearson, Spearman and Kendall correlation coefficients by hand to evaluate the relationship between two variables

Machine Learning Using Decision Trees in Ruby

In the era of hyper-sophisticated machine learning models like ChatGPT, it is surprising how effective the classic decision tree model remains, especially when used in conjunction with other techniques, such as bagging, boosting and random forests. In this blog post we demonstrate how to build an effective decision tree model, and train this model on some sample data.

Probabilistic Machine Learning: Advanced Topics

Dynamic Pricing with Multi-Armed Bandit: Learning by Doing!

Applying Reinforcement Learning strategies to real-world use cases, especially in dynamic pricing, can reveal many surprises

Kernel Density Estimation explained step by step

Intuitive derivation of the KDE formula

Why is Feature Scaling Important in Machine Learning? Discussing 6 Feature

Standardization, Normalization, Robust Scaling, Mean Normalization, Maximum Absolute Scaling and Vector Unit Length Scaling

Evaluation Metrics for Recommendation Systems — An Overview

Understanding the purpose and functionality of common metrics in ML packages

patchy631/machine-learning

Self-Organizing Maps

Learn how Self-Organizing Maps work and why they are a useful unsupervised learning algorithm

Breaking the Data Barrier: How Zero-Shot, One-Shot, and Few-Shot Learning a

Discover the concepts of Zero-Shot, One-Shot, and Few-Shot Learning, which enable machine learning models to classify and recognize objects or patterns with a limited number of examples.

Machine Learning Basics: Polynomial Regression

Learn to build a Polynomial Regression model to predict the values for a non-linear dataset.

Geographic Clustering with HDBSCAN

How to explore geographic data with HDBSCAN, H3, graph theory, and OSM.

Mastering Monte Carlo: How To Simulate Your Way to Better Machine Learning

How a Scientist Playing Solitaire Forever Changed the Game of Statistics

LGBMClassifier: A Getting Started Guide

This tutorial explores the LightGBM library in Python to build a classification model using the LGBMClassifier class.

Hierarchical Navigable Small World (HNSW) is a state-of-the-art algorithm used for an approximate search of nearest neighbours. Under the…

In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and…

Building a Vector Search Engine Using HNSW and Cosine Similarity

Hierarchical Navigable Small World graphs (HNSW) is an algorithm that allows for efficient nearest neighbor search, and the Sentence…

Similarity search is a popular problem where given a query Q we need to find the most similar documents to it among all the documents D.

Learn a powerful technique to effectively compress large data

Explore how similarity information can be incorporated into hash function

Understand how to hash data and reflect its similarity by constructing random hyperplanes

Similarity Search, Part 7: LSH Compositions

Dive into combinations of LSH functions to guarantee a more reliable search

Variational Inference: The Basics

Implementing variational inference from scratch

The Complete Introduction to Survival Analysis in Python | by Marco Peixeir

Understand survival analysis, its use in the industry, and how to apply it in Python

Unbox the Cox: Intuitive Guide to Cox Regressions

How do hazards and maximum likelihood estimates predict event rankings?

A Deep Dive into Autoencoders and Their Relationship to PCA and SVD

An in-depth exploration of autoencoders and dimensionality reduction

Machine Learning in a Non-Euclidean space

Chapter I. Why you should learn about non-Euclidean ML

Creating Incredible Decision Tree Visualizations with dtreeviz

How to visualize decision tree models with this useful library

Unsupervised Learning Series — Exploring Hierarchical Clustering

Let’s explore how hierarchical clustering works and how it builds clusters based on pairwise distances.

A Gentle Introduction to Support Vector Machines

A guide to understanding support vector machines for classification: from theory to scikit-learn implementation.

Feature Transformations: A Tutorial on PCA and LDA

Reducing the dimension of a dataset using methods such as PCA

Uplift Modeling — A Data Scientist’s Guide to Optimizing a Credit Card Rene

Applying causal machine learning to trim the campaign target audience

Introduction to Vector Similarity Search

Learn what vector search is and the metrics pertinent to decide the distance (or similarity) between objects.

The Basics of Anomaly Detection

Basics of anomaly detection, its use-cases, and an implementation of simple yet powerful algorithm in Python

A Gentle Introduction to K-Means Clustering in R (Feat. Tidyclust)

To be successful as a Data Scientist, you’re often put in positions where you need to find groups within your data. One key business use-case is finding clusters of customers that behave similarly. And that’s a powerful skill that I’m going to help you...

Spectral Clustering Algorithm Demystified

Spectral clustering is a method of clustering data points based on their similarity or affinity,...

Diminishing Returns in Machine Learning Part 1

Hardware Development and the Physical Frontier

Sklearn Pipelines for the Modern ML Engineer: 9 Techniques You Can’t Ignore

Master Sklearn pipelines for effortless and efficient machine learning. Discover the art of building, optimizing, and scaling models with ease. Level up your data preprocessing skills and supercharge your ML workflow today

Hidden Data Science Gem: Rainbow Method for Label Encoding | by Anna Arakel

Make stronger and simpler models by leveraging natural order

eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine

The Similarity Engine's use cases include item-to-item similarity for text and image modality and user-to-item personalized recommendations based on a user’s historical behavior data.

Beginner’s Guide to the Must-Know LightGBM Hyperparameters

The most important LightGBM parameters, what they do, and how to tune them

A Guide to Association Rule Mining

Create insights from frequent patterns using market basket analysis with Python

Cycle Detection for Recursive Search in Hierarchical Trees - Database Tip

Recursive queries are a straightforward solution to querying hierarchical trees. However, one loop in the relationship references results in a failing or never ending query when cycle detection is not used.

Master Semantic Search at Scale: Index Millions of Documents with Lightning

Dive into an end-to-end demo of a high-performance semantic search engine leveraging GPU acceleration, efficient indexing techniques, and…

Top Machine Learning Papers to Read in 2023 - KDnuggets

These curated papers would step up your machine-learning knowledge.

Announcing PyCaret 3.0: Open-source, Low-code Machine Learning in Python

Exploring the Latest Enhancements and Features of PyCaret 3.0

Hashing in Modern Recommender Systems: A Primer

Understanding the most underrated trick in applied Machine Learning

The Meaning Behind Logistic Classification, from Physics | by Tim Lou, PhD

Why do we use the logistic and softmax functions? Thermal physics may have an answer.

https://www.uber.com/blog/research/maximum-relevance-and-minimum-redundancy-feature-selection-methods-for-a-marketing-machine-learning-platform

Mixture Models, Latent Variables and the Expectation Maximization Algorithm

Unsupervised learning has always been fascinating to me. It is a way to learn about data without manual labeling effort and allows for the…

12 Ways to Test Your Forecasts like A Pro

How to find the best performance estimation approach for time-series forecasts among 12 strategies proposed in the literature. With Python…

Jaccard index

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It is defined in general taking the ratio of two sizes, the intersection size divided by the union size, also called intersection over union (IoU).

How to make 40 interactive plots to analyze your machine learning pipeline

A quick guide on how to make clean-looking, interactive Python plots to validate your data and model

Uplift Modeling with Cost Optimization

How to adjust CATE to consider costs associated with your treatments

Gradient Boosted Linear Regression in Excel

To even better understand Gradient Boosting

2012.03854.pdf 📄

2003.05689.pdf 📄

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning 📄

2108.02497.pdf 📄

?Top ML Papers of the Week - by elvis - NLP Newsletter

The top ML Papers of the Week (Mar 6 - Mar 12)

Y Combinator–backed Patterns is building a platform to abstract away data s

Patterns, backed by Y Combinator, is building a platform that allows customers to piece together components to build an AI-powered app.

Write Readable Tests for Your Machine Learning Models with Behave

Use natural language to test the behavior of your ML models

How to Understand and Use the Jensen-Shannon Divergence

A primer on the math, logic, and pragmatic application of JS Divergence — including how it is best used in drift monitoring

Introduction of Four Types of Item Similarity Measures

Covers how to choose the similarity measure when item embeddings are available

Probability stats for ds 📄

A practical introduction to sequential feature selection

A gentle dive into this unusual feature selection technique

How to Improve Clustering Accuracy with Bayesian Gaussian Mixture Models

A more advanced clustering technique for real world data

How to Perform Multivariate Outlier Detection in Python PyOD For Machine Le

Discover how to effectively detect multivariate outliers in machine learning with PyOD in Python. Learn to convert anomaly scores to probability confidence, choose the best outlier classifier and determine the right probability threshold for improved model accuracy.

Linear Algebra: LU Decomposition, with Python

Part 4: A comprehensive step-by-step guide to solving a linear system with LU Decomposition

Correlation — When Pearson’s r Is Not Enough

Comparison of various correlation methodologies

skops: a new library to improve scikit-learn in production

There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

PageRank Algorithm for Graph Databases

What is PageRank algorithm? How can it be used in various graph database use cases? How to use it in Memgraph? If these questions are keeping you up at night, here is a blog post that will finally put your mind at ease.

Comparing Different Automatic Image Augmentation Methods in PyTorch

Data augmentation is a key tool in reducing overfitting, whether it's for images or text. This article compares three Auto Image Data Augmentation techniques...

Hyperparameter Optimization: 10 Top Python Libraries

Become familiar with some of the most popular Python libraries available for hyperparameter optimization.

Introducing PyCircular: A Python Library for Circular Data Analysis

Circular data can present unique challenges when it comes to analysis and modeling

What does Entropy Measure? An Intuitive Explanation

Entropy can be thought of as the probability of seeing certain patterns in data. Here’s how it works.

Complete guide to Association Rules (2/2)

Algorithms that help you shop faster and smarter

Brief Introduction to Correspondence Analysis

Learn the basic steps to run a Multiple Correspondence Analysis in R

7 Scikit-Learn Best Practices For Data Scientists

Tips for taking full advantage of this machine learning package

Introduction to Multi-Armed Bandit Problems

Delve deeper into the concept of multi-armed bandits, reinforcement learning, and exploration vs. exploitation dilemma.

Geometric Kernels

A cross-framework package for kernels and Gaussian processes on manifolds, graphs, and meshes

Simple Parquet Tutorial and Best Practices

Hands-on tutorial for starting your Parquet learning

Dense Vectors | Pinecone

Milvus · An Open Source Vector Similarity Search Engine - 开源向量相似度搜索引擎

Open-source vector database built for GenAI applications. Install with pip, perform high-speed searches, and scale to tens of billions of vectors.

PacktPublishing/Python-Feature-Engineering-Cookbook-Second-Edition: Python

Python Feature Engineering Cookbook Second Edition, published by Packt - PacktPublishing/Python-Feature-Engineering-Cookbook-Second-Edition

What Is Survival Analysis? Examples by Hand and in R

Learn more about survival analysis (also called time-to-event analysis) and how it is used, and how to apply it by hand and in R

Zero-shot Learning, Explained - KDnuggets

How you can train a model to learn and predict unseen data?

ChatGPT and the Imagenet moment — Benedict Evans

The wave of enthusiasm around generative networks feels like another Imagenet moment - a step change in what ‘AI’ can do that could generalise far beyond the cool demos. What can it create, and where are the humans in the loop?

Survival Analysis: Optimize the Partial Likelihood of the Cox Model

Finding the coefficients that maximize the log-partial likelihood in Python

Google brings machine learning to online spreadsheets with Simple ML for Sh

Today Google announced a beta release of Simple ML for Sheets, which allows users without ML experience to try ML out on their spreadsheets.

Dual Confidence Regions: A Simple Introduction - DataScienceCentral.com

Simulated confidence regions for machine learning professionals and non-statisticians. Introducing a new concept: dual confidence region.

Machine Learning Dictionary - Machine Learning Techniques

Top entries are in bold, and sub-entries are in italics. This dictionary is from my new book "Intuitive Machine Learning and Explainable AI", available here and used as reference material for the course with the same name (see here). These entries are cross-referenced in the book to facilitate navigation, with backlinks to the pages where

Density-Based Clustering: DBSCAN vs. HDBSCAN

Which algorithm to choose for your data

An Introduction to SMOTE - KDnuggets

Improve the model performance by balancing the dataset using the synthetic minority oversampling technique.

How to Choose the Best Machine Learning Technique: Comparison Table

What Is an Eigenvalue? – Nick Higham

An eigenvalue of a square matrix $LATEX A$ is a scalar $latex \lambda$ such that $latex Ax = \lambda x$ for some nonzero vector $latex x$. The vector $latex x$ is an eigenvector of $LATEX A$ and it…

Last Mile Delivery From Multiple Depots in Python

Mathematical Modeling, Solution, and Visualization Using PuLP and VeRoViz

An Introduction to Topic-Noise Models

Learn how you can use topic-noise models (1/3)

2 Ways to Build Your Own Custom Scikit Learn Transformers

How you can (and why you should) create custom transformers

An Effective Approach for Image Anomaly Detection

Utilize Anomalib from Intel OpenVinoToolkit to benchmark, develop, and deploy deep learning based image anomaly detection

New Book: Approaching (Almost) Any Machine Learning Problem - Machine Learning Techniques

This self-published book is dated July 2020 according to Amazon. But it appears to be an ongoing project. Like many new books, the material is on GitHub, here. The most recent version, dated June 2021, is available in PDF format, here. This is not a traditional book. It feels like a repository of Python code,

5 Essential Qualities of Anomaly Detection Systems

Ensuring your business is proactive and risk-proof.

Scikit-learn 1.1 Comes with an Improved OneHotEncoder

A simple yet highly practical feature

Understanding Logistic Regression — the Odds Ratio, Sigmoid, MLE, et al

Logistic regression is one of the most frequently used machine learning techniques for classification. However, though seemingly simple…

What is ‘Image Super Resolution’, and why do we need it?

An introduction to the field, its applications, and current issues

Image Super-Resolution: An Overview of the Current State of Research

A review of popular techniques and remaining challenges

Logistic Regression: Statistics for Goodness-of-Fit

Statistics in R Series: Deviance, Log-likelihood Ratio, Pseudo R² and AIC/BIC

A New, Transparent AI Tool May Help Detect Blood Poisoning

The algorithm scans electronic records and may reduce sepsis deaths, but widespread adoption could be a challenge.

19 Examples of Merging plots to Maximize your Clustering Scatter plot

Mix and match plots to get more information from a scatter plot

How can you beat XGBoost, CatBoost, and TabNet on tabular data?

Use a cocktail of 13 modern regularization techniques! () [1/9] — Sebastian Raschka (@rasbt)

How to Interpret the Odds Ratio with Categorical Variables in Logistic Regr

An explanation of reference categories and picking the right one

Topic Modeling with LSA, pLSA, LDA, NMF, BERTopic, Top2Vec: a Comparison

A comparison between different topic modeling strategies including practical Python examples

Product Quantization for Similarity Search

How to compress and fit a humongous set of vectors in memory for similarity search with asymmetric distance computation (ADC)

Bayesian Hierarchical Marketing Mix Modeling in PyMC

Learn how to build MMMs for different countries the right way

IVFPQ HNSW for Billion-scale Similarity Search | by Peggy Chang | Towards

The best indexing approach for billion-sized vector datasets

NSVQ: Improved Vector Quantization technique for Neural Networks Training

Efficient vector quantization for machine learning optimizations (eps. vector quantized variational autoencoders), better than straight…

The Basics of Object Detection: YOLO, SSD, R-CNN

Overview of how object detection works, and where to get started

7 Techniques to Handle Imbalanced Data - KDnuggets

This blog post introduces seven techniques that are commonly applied in domains like intrusion detection or real-time bidding, because the datasets are often extremely imbalanced.

Chi-Square Test to Compare Categorical Variables

Complete Guideline to Find Dependencies among Categorical Variables with Chi-Square Test

The Mindset Technique to Understand Precision and Recall Like Never Before

Precision and Recall elaborated with sample situations

Pricing at Lyft

By Yanqiao Wang

Principal Component Analysis: Everything You Need To Know

Covariance, eigenvalues, variance and everything …

[P] My co-founder and I quit our engineering jobs at AWS to build “Tensor Search”. Here is why.

530 votes, 63 comments. My co-founder and I, a senior Amazon research scientist and AWS SDE respectively, launched Marqo a little over a week ago - a…

Linear Regression Analysis – Part 1 - DataScienceCentral.com

Who should read this blog: Someone who is new to linear regression. Someone who wants to understand the jargon around Linear Regression Code Repository: https://github.com/DhruvilKarani/Linear-Regression-Experiments Linear regression is generally the first step into anyone’s Data Science journey. When you hear the words Linear and Regression, something like this pops up in your mind: X1, X2,… Read More »Linear Regression Analysis – Part 1

Introduction to Embedding, Clustering, and Similarity

Introduction to key elements of ML and Autoencoders: Embedding, Clustering, and Similarity.

An Intuitive Explanation of Collaborative Filtering

The post introduces one of the most popular recommendation algorithms, i.e., collaborative filtering. It focuses on building an intuitive understanding of the algorithm illustrated with the help of an example.

How to Use UMAP For Much Faster And Effective Outlier Detection

Let’s catch those high-dimensional outliers

Multi-Objective Ranking for Promoted Auction Items

Determining which promoted auction items to display in a merchandising placement is a multi-sided customer challenge that presents opportunities to both surface amazing auction inventory to buyers and help sellers boost visibility on their auction listings.

Adjacency networks

Finding the adjacency graphs for US states and Texas counties using Mathematica

https://www.einblick.ai/blog/problems-with-notebooks-msftpaper/

Demystifying Object Detection and Instance Segmentation for Data Scientists - MLWhiz

this post is explaining how permutation importance works and how we can code it using ELI5

Patterns, Predictions, and Actions

How to Perform Motion Detection Using Python - KDnuggets

In this article, we will specifically take a look at motion detection using a webcam of a laptop or computer and will create a code script to work on our computer and see its real-time example.

SHAP for Categorical Features with CatBoost

Avoid post-processing the SHAP values of categorical features

9 Visualizations with Python that Catch More Attention than a Bar Chart

Creating eye-catching graphs with Python to use instead of bar charts.

OCR-free document understanding with Donut

Use the recently-released Transformers model to generate JSON representations of your document data

An Introduction to Graph Partitioning Algorithms and Community Detection

Graph partitioning has been a long-lasting problem and has a wide range of applications. This post shares the methodology for graph…

5 Less-Known Python Libraries That Can Help in Your Next Data Science Proje

Reduce time in your data science workflow with these libraries.

https://twitter.com/freakonometrics/status/1550439602594484225?s=12&t=5dAyRh9A5Nw53A3oNf4HPg

Building classifiers with biased classes: AdaSampling comes to the rescue

An algorithmic approach to clean up your dataset and sharpen class assignments.

What is YOLOv7? A Complete Guide.

In this guide, we discuss what YOLOv7 is, how the model works, and the novel model architecture changes in YOLOv7.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for...

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30...

Modeling Marketing Mix Using Smoothing Splines

Capturing non-linear advertising saturation and diminishing returns without explicitly transforming media variables

Linear Algebra for Data Science - KDnuggets

In this article, we discuss the importance of linear algebra in data science and machine learning.

Build Complex Time Series Regression Pipelines with sktime

How to forecast with scikit-learn and XGBoost models with sktime

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize...

Understanding Self-Organising Map Neural Network with Python Code

Brain-inspired unsupervised machine learning through competition, cooperation and adaptation

How to Solve Scheduling Problems in Python

Use linear programming to minimize the difference between required and scheduled resources

what is discriminant analysis at DuckDuckGo

Want to know what is discriminant analysis & how does it help in analyzing data? Read this complete guide on Discriminant analysis now.

Topological Data Analysis for Machine Learning

Probabilistic Numerics | Textbooks

Quantifying Uncertainty in Computation.

firmai/financial-machine-learning: A curated list of practical financial machine learning tools and applications.

A curated list of practical financial machine learning tools and applications. - firmai/financial-machine-learning

Essential Math for Data Science: Eigenvectors and Application to PCA - KDnuggets

$cover image$

In this article, you’ll learn about the eigendecomposition of a matrix.

An Introduction to Regularization

This is what makes your trained models actually usable

grahamjenson/list_of_recommender_systems: A List of Recommender Systems and Resources

A List of Recommender Systems and Resources.

Home Page of Evan Miller

Articles, software, calculators, and opinions.

T-LEAF: Taxonomy Learning and EvaluAtion Framework

How we applied qualitative learning, human labeling and machine learning to iteratively develop Airbnb’s Community Support Taxonomy.

FIGS: Attaining XGBoost-level performance with the interpretability and spe

The BAIR Blog

Three Performance Evaluation Metrics of Clustering When Ground Truth Labels

Which metric should be used to evaluate the clustering results if the ground truth labels are not available? In this post, I’m introducing…

Multi-Relevance Ranking Model for Similar Item Recommendation

Buyers reveal a whole range of behaviors and interests when they browse our pages, so we decided to incorporate these additional purchase intent signals into our machine learning model to improve the relevance of our recommended items.

Complete Step-by-step Genetic Algorithm from Scratch for Global Optimization

No need to worry about getting stuck in local minima anymore

The Battle of Choropleths — Part 3 — Folium

Using the Folium Package to Create Stunning Choropleths

Precision, Recall, and F1 Score of Multiclass Classification — Learn in Dep

Manual Calculation From a Confusion Matrix and the Syntax of sklearn Library

Super Study Guides

Illustrated study guides ideal for visual learners.

Neighborhood Analysis, KD-Trees, and Octrees for Meshes and Point Clouds in

How to use Python libraries like Open3D, PyVista, and Vedo for neighborhood analysis of point clouds and meshes through KD-Trees/Octrees

Time Series Forecasting with ARIMA

In this article, I will take you through the task of Time Series Forecasting with ARIMA using the Python programming language.

Essential Math for Data Science: Visual Introduction to Singular Value Deco

$cover image$

This article will cover singular value decomposition (SVD), which is a major topic of linear algebra, data science, and machine learning.

Say Hello To Recommendation Systems

A preview into one of the most prominent data science applications

DAGs and Control Variables

How to select control variables for causal inference using Directed Acyclic Graphs

A Guide To Using The Difference-In-Differences Regression Model

We’ll show how to use the DID model to estimate the effect of hurricanes on house prices

Sobol Indices to Measure Feature Importance

Understanding the model’s output plays a major role in business-driven projects, and Sobol can help

Reproducible ML: Maybe you shouldn’t be using Sklearn’s train_test_split

Reproducibility is critical for robust data science — after all, it is a science.

Machines are haunted by the curse of dimensionality

The curse of dimensionality comes into play when we deal with a lot of data having many dimensions or features.

Survival Analysis in R (in under 10-minutes)

Making a survival analysis can be a challenge even for experienced R users, but the good news is I’ll help you make beautiful, publication-quality survival plots in under 10-minutes. Here’s what WE are going to do: Make your first survival model an...

How to Evaluate Survival Analysis Models

Introduction to the most popular performance evaluation metrics for survival analysis along with practical Python examples

Flip Flop: Why Zillow’s Algorithmic Home Buying Venture Imploded

XGBoost Alternative Base Learners

Introducing dart, gblinear, and XGBoost Random Forests

Similarity-Based Image Search for Visual Art

Evaluating similarity of visual art from both human perceptual & quantitative judgments

Useful Python decorators for Data Scientists

I show toy implementations of Python decorator patterns that may be useful for Data Scientists.

One Line of Code to Accelerate Your Sklearn Algorithms on Big Data

The introduction of the intel sklearn extension. Make your Random Forest even faster than XGBoost.

CatBoost vs. LightGBM vs. XGBoost

Which is the best algorithm?

The Big Six Matrix Factorizations – Nick Higham

Six matrix factorizations dominate in numerical linear algebra and matrix analysis: for most purposes one of them is sufficient for the task at hand. We summarize them here. For each factorization …

93 Datasets That Load With A Single Line of Code

How you can pull one of a few dozen example political, sporting, education, and other frames on-the-fly.

Survival Analysis: A Brief Introduction

An initial look into the method best suited for examining time-to-event data

Focal Loss : A better alternative for Cross-Entropy

Focal loss is said to perform better than Cross-Entropy loss in many cases. But why Cross-Entropy loss fails, and how Focal loss addresses…

Data Mining: Market Basket Analysis with Apriori Algorithm

Uncovering the secret behind why breads are always conveniently placed beside butter in groceries

How does Shazam work? Music Recognition Algorithms, Fingerprinting, and Processing | Toptal®

The Shazam music recognition application made it finally possible to put a name to that song on the radio. But how does this magical miracle actually work? In this article, Toptal Freelance Software Engineer Jovan Jovanovic sheds light on the principles of audio signal processing, fingerprinting, and recognition,...

Louvain’s Algorithm for Community Detection in Python

Apply Louvain’s Algorithm in Python for Community Detection

Improving Shopping Recommendations for Customers Through eBay’s Relevance C

Under the new machine learning model, buyers are recommended items that are more aligned to their shopping interests on eBay.

Introduction to SHAP Values and their Application in Machine Learning

Learn how the SHAP library works under the hood

Multi-Armed Bandit Algorithms: Thompson Sampling

Intuition, Bayes, and an example

The Top 10 Algorithms Every Programmer Should Know In Graph Data Structure!

Why to learn these graph algorithms? Graph algorithms are a set of instructions that...

Introduction — Machine Learning from Scratch

Evaluating the potential return of a model with Lift, Gain, and Decile Anal

Use these three tools to understand the usefulness of your machine learning models

CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All i...

Natural Language Processing with Transformers Book

Welcome · Advanced R.

Welcome | Data Science at the Command Line, 2e

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux.

AI Virtual Assistant Technology Guide 2022

They can help you get an appointment or order a pizza, find the best ticket deals and bring your...

3 t7n57q bj g

5–10x Faster Hyperparameter Tuning with HalvingGridSearch

How to optimize the hyperparameters of a machine learning model and how to speed up the process

Experiment Tracking with MLflow in 10 Minutes

Managing Machine Learning Lifecycle made easy — explained with Python examples

A Guide To ML Experiment Tracking — With Weights & Biases

Easily learn to track all of your ML experiments with metrics and logs with an example project walkthrough!

Read this before using ROC-AUC as a metric

No metric is perfect.

Text Summarization with NLP: TextRank vs Seq2Seq vs BART

Natural Language Processing with Python, Gensim, Tensorflow, Transformers

SHAP: Explain Any Machine Learning Model in Python

A Comprehensive Guide to SHAP and Shapley Values

Real-world website visitor forecast with Facebook Prophet: A Complete Tutor

As a data analyst at Microsoft, I must investigate and understand time-series data every day. Besides looking at some key performance…

Asset2Vec: Turning 3D Objects into Vectors and Back

How we used NeRF to embed our entire 3D object catalogue to a shared latent space, and what it means for the future of graphics

Interpretable Machine Learning using SHAP — theory and applications

Introduction

Introducing TorchRec, a library for modern production recommendation system

We are excited to announce TorchRec, a PyTorch domain library for Recommendation Systems. This new library provides common sparsity and parallelism primitives, enabling researchers to build state-of-the-art personalization models and deploy them in production.

What is Relational Machine Learning?

A dive into fundamentals of learning representations beyond feature vectors

Machine Learning Algorithms Cheat Sheet — Accel.AI

Machine learning is a subfield of artificial intelligence (AI) and computer science that focuses on using data and algorithms to mimic the way people learn, progressively improving its accuracy. This way, Machine Learning is one of the most interesting methods in Computer Science these days, and it'

Topic Modeling in Python | Toptal

Topic modeling can bring NLP to the next level. Here’s how.

What Color Is This? | Stitch Fix Technology – Multithreaded

We need to know what colors our merch is. But because downstream users include many different people and algorithms, we need to describe colors as a hierarch...

Machine Learning Gets a Quantum Speedup | Quanta Magazine

Two teams have shown how quantum approaches can solve problems faster than classical computers, bringing physics and computer science closer together.

Data Scientists, The 5 Graph Algorithms that you should know

Because Graph Analytics is the future

scikit-and-tensorflow-workbooks/ch03-classification.ipynb at master · bjpcjp/scikit-and-tensorflow-workbooks

based on "Hands-On Machine Learning with Scikit-Learn & TensorFlow" (O'Reilly, Aurelien Geron) - bjpcjp/scikit-and-tensorflow-workbooks

What Internet Search Patterns Can Teach Us About Coping

I analyzed thousands of searches by people who were diagnosed with cancer. Their queries offer valuable lessons that could improve the way doctors treat patients.

3 Reasons Why Data Scientists Should Use LightGBM

There are many great boosting Python libraries for data scientists to reap the benefits of. In this article, the author discusses LightGBM benefits and how they are specific to your data science job.

Survival Analysis in Python: A Quick Guide to The Weibull Analysis

A Quick Guide to The Weibull Analysis

fb-prophet/01_docs.ipynb at master · bjpcjp/fb-prophet

Prophet (FB time series prediction package) docs to Python code. - bjpcjp/fb-prophet

scikit-learn/64_imputation.ipynb at master · bjpcjp/scikit-learn

Updates in progress. Jupyter workbooks will be added as time allows. - bjpcjp/scikit-learn

python-data-science-handbook/scikit/SciKit-Kernel-Density-Estimation.ipynb at master · bjpcjp/python-data-science-handbook

Sourced from O'Reilly ebook of the same name.

Multi-dimensional Decision Boundary : why current approaches fail and how t

The decision boundary is a very important visual tool for model evaluation. See how to get it to work on complex datasets

python-data-science-handbook/scikit/SciKit-Principal-Component-Analysis.ipynb at master · bjpcjp/python-data-science-handbook

Sourced from O'Reilly ebook of the same name.

scikit-and-tensorflow-workbooks/ch05-support-vector-machines.ipynb at master · bjpcjp/scikit-and-tensorflow-workbooks

based on "Hands-On Machine Learning with Scikit-Learn & TensorFlow" (O'Reilly, Aurelien Geron) - bjpcjp/scikit-and-tensorflow-workbooks

The Kaggle Way to Tune Hyperparameters with Optuna

Easily and efficiently optimize your model’s hyperparameters with Optuna with a mini project

Top ten Machine Learning APIs.

Thread 🧵👇🏻 — Rapid (@Rapid_API)

https://www.mapr.com/blog/apache-spark-machine-learning-tutorial

Introduction to Survival Analysis

In Cows using R

thoughtworks/mlops-platforms: Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow... - thoughtworks/mlops-platforms

Essential Guide to Auto Encoders in Data Science (Part 2)

Discussing the type of Auto Encoders

How a Kalman filter works, in pictures | Bzarg

A Comprehensive Guide of Regularization Techniques in Deep Learning | by Eu

Understanding how Regularization can be useful to improve the performance of your model

eugeneyan/applied-ml: ? Papers & tech blogs by companies sharing their work

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production. - eugeneyan/applied-ml

A Practical Guide to ARIMA Models using PyCaret — Part 4

Combining the “Trend” and “Difference” Terms

11 Different Uses of Dimensionality Reduction

The whole ML is full of dimensionality reduction and its applications. Let’s see them in action!

PyTorch vs TensorFlow in 2023

Should you use PyTorch vs TensorFlow in 2023? This guide walks through the major pros and cons of PyTorch vs TensorFlow, and how you can pick the right framework.

Machine-Learning-Tokyo/Interactive_Tools: Interactive Tools for Machine Learning, Deep Learning and Math

Interactive Tools for Machine Learning, Deep Learning and Math - Machine-Learning-Tokyo/Interactive_Tools

Image Kernels explained visually

Learning with not Enough Data Part 1: Semi-Supervised Learning

Drift in Machine Learning

Why is it hard and what to do about it?

3 (and Half) Powerful Tricks To Effectively Read CSV Data In Python

Master usecols, chunksize, parse_dates in pandas read_csv().

Mito: One of the Coolest Python Libraries You Have Ever Seen

Here is my take on this cool Python library and why you should give it a try

A Guide to Dimensionality Reduction in Python

Dimensionality reduction is a vital tool for data scientists across industries. Here is a guide to getting started with it.

Efficient matrix multiplication

Efficient matrix multiplication · GitHub

Why fast, effective data labeling has become a competitive advantage (VB Li

If an AI model can make decisions on the company’s behalf through products and services, that model is essentially their competitive edge.

7 DevOps skills for Machine Learning Operations | by Ricardo Mendes | Nov,

Lessons learned from successful MLOps implementation

Three R Libraries Every Data Scientist Should Know (Even if You Use Python)

Powerful R libraries built by the World’s Biggest Tech Companies

An Introduction to Lagrange Multipliers

9 Distance Measures in data science with algorithms.

via: — MIT CSAIL (@MIT_CSAIL)

Semi-Supervised Learning — How to Assign Labels with Label Propagation Algo

How does Semi-Supervised Machine Learning work, and how to use it in Python?

A Complete Machine Learning Project From Scratch: Setting Up

In this first post in a series on how to build a complete machine learning product from scratch, I describe how to setup your project and tooling.

MedMNIST v2 Dataset | Papers With Code

MedMNIST v2 is a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28 x 28 (2D) or 28 x 28 x 28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision and machine learning. Description and image from: MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification Each subset keeps the same license as that of the source dataset. Please also cite the corresponding paper of source data if you use any subset of MedMNIST.

Applications and Techniques for Fast Machine Learning in Science

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental...

A Free And Powerful Labelling Tool Every Data Scientist Should Know

One of the best labelling tools I have ever used.

An Introduction to PyTorch Lightning

PyTorch Lightning has opened many new possibilities in deep learning and machine learning with a high level interface that makes it quicker to work with PyTorch.

Benefits of the CatBoost Machine Learning Algorithm

for Data Scientists and ML Engineers

Clustering Made Easy with PyCaret

Low-code Machine Learning with a Powerful Python Library

Kernel Methods: A Simple Introduction

The basics of kernel methods and Radial Basis Functions

Streamlit, which helps data scientists build apps, hits version 1.0

Streamlit releases v1.0 of its DataOps platform for data science apps to make it easier for data scientists to share code and components.

Essential Linux Command-Line Tricks for Computer Vision Researchers

In this post, you will learn some cool command line tricks which can help you to speed up your day-to-day R&D.

A friendly introduction to machine learning compilers and optimizers

[Twitter thread, Hacker News discussion]

graviraja/MLOps-Basics

Optimal Estimation Algorithms: Kalman and Particle Filters

An introduction to the Kalman and Particle Filters and their applications in fields such as Robotics and Reinforcement Learning.

How to Analyze 100-Dimensional Data with UMAP in Breathtakingly Beautiful W

Create breathtaking visuals and “see” your data

Top 38 Python Libraries for Data Science, Data Visualization & Machine Lear

This article compiles the 38 top Python libraries for data science, data visualization & machine learning, as best determined by KDnuggets staff.

A Practical Introduction to 9 Regression Algorithms

Hands-on tutorial to effectively use different Regression Algorithms

1211570060

[2106.10860v1] Multiplying Matrices Without Multiplying

Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix...

Gaussian Belief Propagation

Detecting knee- / elbow points in a graph

Using “Kneedle” algorithmus detecting knees with Python package “kneed”

Complete guide to understanding Node2Vec algorithm

An in-depth guide to understanding node2vec algorithm and its hyper-parameters

The history of Amazon’s forecasting algorithm

The story of a decade-plus long journey toward a unified forecasting model.

An Introduction to Statistical Learning

Tokenization Algorithms Explained

A one-stop-shop for all your tokenization needs

Hora | Hora Search Everywhere

5 Ultimate Python Libraries for Image Processing

OpenCV is not the only one

An introduction to A* pathfinding (tutorial)

This is part 3 of a series on bot programming originally published on the Coder One blog. Part 1:...

How to Add Uncertainty Estimation to your Models with Conformal Prediction

Why conformal prediction for uncertainty estimation can improve your predictions

8 Dimensionality Reduction Techniques every Data Scientists should know

Essential guide to various dimensionality reduction techniques in Python

scikit-learn-intelex · PyPI

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application.

Papers with Code - Paper with Code Newsletter

Papers With Code highlights trending Machine Learning research and the code to implement it.

Apriori Algorithm for Association Rule Learning — How To Find Clear Links B

Explanation and examples of frequent itemset mining and association rule learning over relational databases in Python

Types of Correlation Coefficients

Different Kinds of Correlation Coefficients in a Deeper Look

Hands-on Survival Analysis with Python

What companies can learn from employee turnover data

Building a VAE Playground with Streamlit

With Streamlit creating a deploying a web app can be very easy!

Semantic Search: Measuring Meaning From Jaccard to Bert

Similarity search is one of the fastest-growing domains in AI and machine learning. At its core, it is the process of matching relevant pieces of information together.

Read Excel files with Python. 1000x Faster.

In this article, I’ll show you five ways to load data in Python. Achieving a speedup of 3 orders of magnitude.

The Methods Corpus | Papers With Code

2284 methods • 143838 papers with code.

Same or Different? The Question Flummoxes Neural Networks. | Quanta Magazine

For all their triumphs, AI systems can’t seem to generalize the concepts of “same” and “different.” Without that, researchers worry, the quest to create truly intelligent machines may be hopeless.

Deep Scatterplots

GPBoost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Mo

Combining tree-boosting with Gaussian process and mixed effects models - fabsig/GPBoost

New Machine Learning Gems for Ruby

In August, I set out to improve the machine learning ecosystem for Ruby and wasn’t sure where it would go. Over the next 5 months, I ended up...

Learn R through examples

This is a draft of a book for learning data analysis with the R language. This book emphasizes hands activities. Comments and suggestions are welcome.

Complete Guide to Data Augmentation for Computer Vision

Data Augmentation is one of the most important topics in Deep Computer Vision. When you train your neural network, you should do data augmentation like… ALWAYS. Otherwise, you are not using your…

Supercharge Your Machine Learning Experiments with PyCaret and Gradio - KDnuggets

A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.

Sentiment Analysis — Comparing 3 Common Approaches: Naive Bayes, LSTM, and

Sentiment Analysis, or Opinion Mining, is a subfield of NLP (Natural Language Processing) that aims to extract attitudes, appraisals, opinions, and emotions from text. Inspired by the rapid migration…

Interpreting Scattertext: a seductive tool for plotting text

Scroll down to see how to interpret a plot created by a great tool for comparing two classes and their corpora.

Metric-Based (Ratings-based) Conjoint Analysis

In marketing analytics, conjoint analysis is a technique used to gain specific insights about consumers’ preferences. Often derived from consumer surveys, conjoint analysis can tell us, for instance…

Introduction to Object Detection Model Evaluation

Evaluating object detection models is not straightforward because each image can have many objects and each object can belong to different classes. This means that we need to measure if the model…

10 Must Read ML Blog Posts

A collection of high-impact machine learning blog posts.

https://ruder.io/optimizing-gradient-descent/

NeurIPS 2021 Announcement: The Billion-Scale Approximate Nearest Neighbor Search Challenge

We are excited to announce that this year’s NeurIPS 2021 Conference will host a first-of-its-kind competition in large scale approximate…

Machine learning and recommender systems using your own Spotify data

Creating Spotify recommendations with data science

Causal ML for Data Science: Deep Learning with Instrumental Variables

Combining data science and econometrics for an introduction to the DeepIV framework, including a full Python code tutorial.

An Introduction to PyTorch Lightning

Word on the street is that PyTorch lightning is a much better version of normal PyTorch. But what could it possibly have that it brought such consensus in our world? Well, it helps researchers scale…

Combinatorial Optimization: The Knapsack Problem

In this story, we are going to discuss an application of dynamic programming techniques to an optimization algorithm. Through the process of developing an optimal solution, we get to study a variety…

Algorithm-Assisted Inventory Curation

Building the raw materials for personalization at scale

How image search works at Dropbox - Dropbox

12 Jupyter Notebook Extensions That Will Make Your Life Easier

Essential extensions that will boost your productivity in Jupyter Notebook.

Theoretical Understandings of Product Embedding for E-commerce Machine Lear 📄

Projects

Find out about all of the projects of Meta Open Source.

Artificial Intelligence Develops an Ear for Birdsong - Scientific American

Machine-learning algorithms can quickly process thousands of hours of natural soundscapes

Game theory as an engine for large-scale data analysis | DeepMind

Modern AI systems approach tasks like recognising objects in images and predicting the 3D structure of proteins as a diligent student would prepare for an exam. By training on many example...

Singular value decomposition - Wikipedia

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any ⁠⁠ matrix. It is related to the polar decomposition.

GLMs Part I: A Rigorous Mathematical Formulation | by Andrew Rothman | Apr,

Intuition for Unifying Theory of GLMs with Derivations in Canonical and Non-Canonical Forms

GLMs Part II: Newton-Raphson, Fisher Scoring, & Iteratively Reweighted Leas

Generalized Linear Models (GLMs) play a critical role in fields including Statistics, Data Science, Machine Learning, and other computational sciences. In Part I of this Series, we provided a…

The 5 Feature Selection Algorithms every Data Scientist should know

Bonus: What makes a good footballer great?

Prophet | Forecasting at scale.

Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.

11 Dimensionality reduction techniques you should know in 2021

Reduce the size of your dataset while keeping as much of the variation as possible

Kolmogorov Complexity: Extensions and Applications

Principal Component Analysis explained visually

https://betterexplained.com/articles/hyperbolic-functions/

Amazon open-sources library for prediction over large output spaces

Framework improves efficiency, accuracy of applications that search for a handful of solutions in a huge space of candidates.

Notebook on nbviewer

Check out this Jupyter notebook!

Advanced forecasting using Bayesian diffusion modeling

Applications from cancer to covid-19

Nine Emerging Python Libraries You Should Add to Your Data Science Toolkit in 2022

As Data Science continues to grow and develop, it’s only natural for new tools to emerge, especially considering the fact that data…

Instacart Market Basket Analysis. Winner’s Interview: 2nd place, Kazuki… |

Winner’s Interview: 2nd place, Kazuki Onodera

Best Practices for Using AI to Develop the Most Accurate Retail Forecasting

A leading global retailer has invested heavily in becoming one of the most competitive technology companies around. Accurate and timely demand forecasting for millions of item-by-store combinations is…

Automate Hyperparameter Tuning for Multiple Models with Hyperopts

Automate your hyperparameter tuning with Sklearn Pipelines and Hyperopts for multiple models in a single python call. Let's dig into the process...

Deep In Singular Value Decomposition

There are often times when working in Data Science where we might come across a feature that is very difficult to interpret by a computer. This is often because the dimensions of the data are much…

nbterm: Jupyter Notebooks in the terminal

Jupyter notebooks are mostly known for their web-based user interface, such as JupyterLab or the Classic Notebook. They offer a great user…

Spotify Genre Classification Algorithm

Supervised Machine Learning — SVM, RANDOM FOREST, LOGISTIC REGRESSION

A Summary of Active Learning Frameworks

If you are dealing with a classification task, I recommend the modAL. As for the sequence labeling task, the AlpacaTag is the only choice for you. Active learning could decrease the number of labels…

Gentle introduction to 2D Hand Pose Estimation: Approach Explained

Detailed tutorial on where to find a dataset, how to preprocess data, what model architecture and loss to use, and, finally, how to…

What Really IS a Matrix Determinant?

The geometric intuition behind determinants could change how you think about them.

A Primer on the EM Algorithm

Christian Zuniga, PhD

When and how to use power transform in machine learning

Let’s see this powerful tool of data pre-processing

Time Series Forecasting with PyCaret Regression Module

PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.

Zero-Shot Learning: Can you classify an object without seeing it before?

Data Science, Machine Learning, AI & Analytics

Using Gaussian Mixture Models to Transform User-Item Embedding and Generate Better User Clusters

Improve clustering of user-item embedding by using GMM to generate new and tighter features

9 Distance Measures in Data Science

The advantages and pitfalls of common distance measures

How to create custom scikit-learn classification and regression models

Scikit learn is *the* go to package for standard machine learning models in Python. It not only provides most of the core algorithms that…

DIY XGBoost library in less than 200 lines of python

XGBoost explained as well as gradient boosting method and HP tuning by building your own gradient boosting library for decision trees.

11 Times Faster Hyperparameter Tuning with HalvingGridSearch

Successive halving completely crushes GridSearch and RandomSearch

CPU-based algorithm trains deep neural nets up to 15 times faster than top

Rice University computer scientists have demonstrated artificial intelligence (AI) software that runs on commodity processors and trains deep neural networks 15 times faster than platforms based on graphics ...

Deploying a basic Streamlit app to Heroku

This article demonstrates the deployment of a basic Streamlit app (that simulates the Central Limit Theorem) to Heroku.

Beginner’s Guide to XGBoost for Classification Problems

Utilize the hottest ML library for state-of-the-art performance in classification

Deploying a basic Streamlit app

This article demonstrates the deployment of a basic Streamlit app (that predicts the Iris’ species) to Streamlit Sharing.

You are underutilizing shap values — feature groups and correlations

Your model is a lens into your data, and shap its telescope

AI Planning using Constraint Satisfaction Problems

Using Constraint Satisfaction Problems to solve AI Planning Problems.

Principal component regression

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

Quadratic Discriminant Analysis

A deep introduction to Quadratic Discriminant Analysis (QDA) with theory and Python implementation

UCI Machine Learning Repository

Discover datasets around the world!

Evaluating Search Algorithms

The three-step framework Shopify's Data Science & Engineering team built for evaluating new search algorithms.

130 Machine Learning Projects Solved and Explained

Machine Learning Projects solved and explained for free

Prediction Intervals for Gradient Boosting Regression

This example shows how quantile regression can be used to create prediction intervals. See Features in Histogram Gradient Boosting Trees for an example showcasing some other features of HistGradien...

Polynomial Regression in Python

Machine Learning from Scratch: Part 4

Three Model Compression Methods You Need To Know in 2021

Creative techniques to make complex models smaller

5 Things You Should Know About Covariance

When dealing with problems on statistics and machine learning, one of the most frequently encountered terms is covariance. While most of…

A Beginner’s Guide to Image Augmentations in Machine Learning

Data Augmentation is one of the most important yet underrated aspects of a machine learning system …

3 Key Pieces of Information About Logistic Regression Algorithm

It is a simple yet very efficient algorithm

4 Machine learning techniques for outlier detection in Python

Machine learning-based outlier detection

UPDATED: Using R and H2O to identify product anomalies during the manufactu

Note. This is an update to article: Using R and H2O to identify product anomalies during the manufacturing process.It has some updates but also code optimization from Yana Kane-Esrig( https://www.linkedin.com/in/ykaneesrig/ ), as sh...

Xgboost regression training on CPU and GPU in python

GPU vs CPU training speed comparison for xgboost

VISSL · A library for state-of-the-art self-supervised learning from images

A library for state-of-the-art self-supervised learning from images

Graph Theory Basics

What you need to know as graph theory adoption continues to take off

Conda: essential concepts and tricks

for beginners as well as advanced users

XGBoost: Extreme Gradient Boosting — How to Improve on Regular Gradient Boosting?

A detailed look at differences between the two algorithms and when you should choose one over the other

4 Easy Steps for Implementing CatBoost

an end-to-end tutorial on how to apply an emerging Data Science algorithm

Two outlier detection techniques you should know in 2021

Elliptic Envelope and IQR-based detection

stitchfix/mab: Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy. - stitchfix/mab

Gaussian Process Regression From First Principles

Gaussian Process Regression is a remarkably powerful class of machine learning algorithms. Here, we introduce them from first principles.

Introduction to hierarchical clustering (Part 3 — Spatial clustering)

Introducing a spatial dimension into hierarchical clustering

A Comprehensive Mathematical Approach to Understand AdaBoost

Learn how AdaBoost works from a Math perspective, in a comprehensive and straight-to-the-point manner.

How you can quickly build ML web apps with Streamlit.

The quickest way to embed your models into web apps.

Thompson Sampling using Conjugate Priors

Multi-Armed Bandits: Part 5b

stanfordmlgroup/ngboost: Natural Gradient Boosting for Probabilistic Prediction

Natural Gradient Boosting for Probabilistic Prediction - stanfordmlgroup/ngboost

How to use PyCaret — the library for lazy data scientists

Train, visualize, evaluate, interpret, and deploy models with minimal code.

The Algorithms That Make Instacart Roll

Instacart crunches petabytes daily to predict what will be on grocery shelves and even how long it will take to find parking

State-of-the-Art Image Generation Models

I have aggregated some of the SotA image generative models released recently, with short summaries, visualizations and comments. The overall development is summarized, and the future trends are spe…

Gradient-Free-Optimizers A collection of modern optimization methods in Pyt

Simple and reliable optimization with local, global, population-based and sequential techniques in numerical discrete search spaces. - SimonBlanke/Gradient-Free-Optimizers

PyCaret — pycaret 2.2.0 documentation

Home - PyCaret

[et_pb_section fb_built=”1″ admin_label=”Header” _builder_version=”4.12.0″ background_color=”#01012C” collapsed=”on” global_colors_info=”{}”][et_pb_row column_structure=”1_2,1_2″ _builder_version=”4.12.0″ collapsed=”on” global_colors_info=”{}”][et_pb_column type=”1_2″ _builder_version=”4.12.0″ z_index=”10″ custom_padding=”18%||||false|false” global_colors_info=”{}”][et_pb_text _builder_version=”4.14.7″ text_font=”Montserrat|800|||||||” text_text_color=”#01012C” text_font_size=”470px” text_line_height=”1em” positioning=”absolute” custom_margin=”|-30%||-10%|false|false” custom_margin_tablet=”|0%||-5%|false|false” custom_margin_phone=”|0%|||false|false” custom_margin_last_edited=”on|desktop” text_font_size_tablet=”40vw” text_font_size_phone=”40vw” text_font_size_last_edited=”on|tablet” text_text_shadow_style=”preset5″ text_text_shadow_horizontal_length=”-1.5px” text_text_shadow_vertical_length=”-1.5px” text_text_shadow_color=”#DB0EB7″ global_colors_info=”{}”] pc [/et_pb_text][et_pb_text _builder_version=”4.14.7″ header_font=”Barlow Condensed|500|||||||” header_text_color=”#FFFFFF” header_font_size=”122px” custom_margin=”||0px||false|false” header_font_size_tablet=”42px” header_font_size_phone=”26px” header_font_size_last_edited=”on|tablet” global_colors_info=”{}”] low-code machine learning [/et_pb_text][et_pb_button button_url=”https://pycaret.gitbook.io” url_new_window=”on” button_text=”GET STARTED” _builder_version=”4.14.7″ […]

A Complete Guide To Survival Analysis In Python, part 3 - KDnuggets

Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.

www-eio.upc.edu/~pau/cms/rdata/datasets.html

An overview of synthetic data types and generation methods

Synthetic data can be used to test new products and services, validate models, or test performances because it mimics the statistical property of production data. Today you'll find different types of structured and unstructured synthetic data.

Affinity Analysis (Market Basket Analysis)

Have you ever wondered how often do you buy certain items together? Why do you buy some items together? How likely do you purchase an item…

Categorical cross-entropy and SoftMax regression

Ever wondered how to implement a simple baseline model for multi-class problems ? Here is one example (code included).

Explain Machine Learning Models: Partial Dependence

From black box to no more box

Error Backpropagation Learning Algorithm

The error backpropagation learning algorithm is a supervised learning technique for neural networks that calculates the gradient of descent for weighting different variables.

Why you should always use feature embeddings with structured datasets

A simple technique for boosting accuracy on ANY model you use

Data Science & AI Glossary | DeepAI

The data science and artificial intelligence terms you need while reading the latest research

Generative Graph Models with NetworkX

A comprehensive guide on standard generative graph approaches with implementation in NetworkX

Hacker News

Computer Science, Machine Learning, Programming, Art, Mathematics, Philosophy, and Short Fiction

Hacker News

Data labeling is often the biggest bottleneck in machine learning. Active learning lets you train machine learning models with much less labeled data. The best AI-driven companies, like Tesla, use active learning.

The Ultimate Scikit-Learn Machine Learning Cheatsheet - KDnuggets

With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.

Image Processing with Python — Blob Detection using Scikit-Image

How to identify and segregate specific blobs in your image

General Methods | Papers With Code

Browse 1109 deep learning methods for General.

Decades-Old Graph Problem Yields to Amateur Mathematician

By making the first progress on the “chromatic number of the plane” problem in over 60 years, an anti-aging pundit has achieved mathematical immortality.

SVM Classifier and RBF Kernel — How to Make Better Models in Python

A complete explanation of the inner workings of Support Vector Machines (SVM) and Radial Basis Function (RBF) kernel

Using strip charts to visualize dozens of time series at once

Strip charts are extremely useful to make heads or tails from dozens (and up to several hundred) of time series over very long periods of…

Link Prediction and Information Theory: A Tutorial

Using Mutual Information to measure the likelihood of candidate links in a graph.

Forget coding, you can now solve your AI problems with Excel

Microsoft Excel is a powerful tool for learning the basics of data science and machine learning.

Algorithms for Decision Making | Hacker News

Jason's Machine Learning 101

Jason Mayes Senior Creative Engineer, Google Machine Learning 101 Feel free to share this deck with others who are learning! Send me feedback here. Dec 2017 Welcome! If you are reading the notes there are a few extra snippets down here from time to time. But more for my own thoughts, feel free to...

Model Compression: A Look into Reducing Model Size

Why is Model Compression important? A significant problem in the arms race to produce more accurate models is complexity, which leads to…

New Features of Scikit-Learn. An Overview of the Most Important… | by Ankit

An Overview of the Most Important Features in Version 0.24

Comparing Binary, Gray, and One-Hot Encoding

This article shows a comparison of the implementations that result from using binary, Gray, and one-hot encodings to implement state machines in an FPGA. These encodings are often evaluated and applied by the synthesis and implementation tools, so it’s important to know why the software makes these decisions.

Meet whale! ? The stupidly simple data discovery tool. | by Robert Yi | Dat

And why your data science team needs it.

How to Automate Tasks on GitHub With Machine Learning for Fun and Profit |

A tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.

10 Stochastic Gradient Descent Optimisation Algorithms Cheat Sheet | by R

Stochastic gradient descent optimisation algorithms you should know for deep learning

Benchmark functions | BenchmarkFcns

This website is for sale! benchmarkfcns.xyz is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, benchmarkfcns.xyz has it all. We hope you find what you are searching for!

8 New Tools I Learned as a Data Scientist in 2020 | by Ben Weber | Dec, 202

Making the move from Docker to Live Deployments

Practical Graph Theory in Ruby

This is the next installment in the "Practical Computer Science" series, where you will learn how to apply classic computer science concepts to solve real problems using Ruby. Today we are going to talk about Graph

BFGS in a Nutshell: An Introduction to Quasi-Newton Methods

Demystifying the inner workings of BFGS optimization

Lagrange multipliers with visualizations and code | by Rohit Pandey | Towar

In this story, we’re going to take an aerial tour of optimization with Lagrange multipliers. When do we need them? Whenever we have an…

Beyond One-Hot. 17 Ways of Transforming Categorical Features Into Numeric F

All the encodings that are worth knowing — from OrdinalEncoder to CatBoostEncoder — explained and coded from scratch in Python

Particle Swarm Optimization Visually Explained

Learn PSO algorithm as a bedtime story with GIFs and python code

Computer Vision | Papers With Code

**Zero-shot learning (ZSL)** is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning. Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning. Benchmark datasets for zero-shot learning include [aPY](/dataset/apy), [AwA](/dataset/awa2-1), and [CUB](/dataset/cub-200-2011), among others. ( Image credit: [Prototypical Networks for Few shot Learning in PyTorch ](https://github.com/orobix/Prototypical-Networks-for-Few-shot-Learning-PyTorch) ) Further readings: - [Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly](https://paperswithcode.com/paper/zero-shot-learning-a-comprehensive-evaluation) - [Zero-Shot Learning in Modern NLP](https://joeddav.github.io/blog/2020/05/29/ZSL.html) - [Zero-Shot Learning for Text Classification](https://amitness.com/2020/05/zero-shot-text-classification/)

Computer Vision | Papers With Code

**Few-Shot Learning** is an example of meta-learning, where a learner is trained on several related tasks, during the meta-training phase, so that it can generalize well to unseen (but related) tasks with just few examples, during the meta-testing phase. An effective approach to the Few-Shot Learning problem is to learn a common representation for various tasks and train task specific classifiers on top of this representation. Source: [Penalty Method for Inversion-Free Deep Bilevel Optimization ](https://arxiv.org/abs/1911.03432)

The Sensitivity Analysis: A Powerful Yet Underused Tool for Data Scientists

Quantifying the effects of varying different inputs, applied on a gemstone dataset with over 50K round-cut diamonds

How to Deploy your Custom ML Model with Streamlit and Heroku

A Step-by-Step Guide to Host your Models!

Project Lighthouse — Part 1: P-sensitive k-anonymity

Part one of a series on how we will measure discrepancies in Airbnb guest acceptance rates using anonymized perceived demographic data.

Sensitivity, Specificity and Meaningful Classifiers

The sometimes confusing concepts involved in interpreting coronavirus testing

SVM Kernels: What Do They Actually Do?

An intuitive visual explanation

Log-Normal Distribution

A Log-Normal Distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.

Stop One-Hot Encoding Your Categorical Variables.

There are many better alternatives

Matching of Bipartite Graphs using NetworkX

A simple introduction to matching in bipartite graphs with Python code examples

AI system for high precision recognition of hand gestures

Scientists have developed an Artificial Intelligence (AI) system that recognises hand gestures by combining skin-like electronics with computer vision.

Clustering Using Convex Hulls

How to use convex hulls in data clustering

Getting Started with Random Matrices: A Step-by-Step Guide

In the Deep Learning (DL) age, more and more people have encountered and used (knowingly or not) random matrices. Most of the time this…

10 Graph Algorithms Visually Explained

A quick introduction to 10 basic graph algorithms with examples and visualisations

www.cheatsheets.aqeel-anwar.com

Cheat Sheets for Machine Learning and Data Science

Peregrine: A Pattern-Aware Graph Mining System

Peregrine: A Pattern-Aware Graph Mining System.

5 Categorical Encoding Tricks you need to know today as a data scientist

A Complete Pythonic Encoding Tutorial

Speech Recognition with Python. Learn which of the 9 most prominent… | by S

Learn which of the 9 most prominent automatic speech recognition engines is best for your needs, and how to use it in Python programs.

4 Rarely-Used Yet Very Useful Pandas Tricks

Explained with examples

A radical new technique lets AI learn with practically no data

“Less than one”-shot learning can teach a model to identify more objects than the number of examples it is trained on.

A Comparison of Bandit Algorithms

Multi-Armed Bandits: Part 6

How to Choose a Feature Selection Method For Machine Learning - MachineLearningMastery.com

Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Statistical-based feature selection methods involve evaluating the relationship between each input variable and the…

MLWhiz: Helping You Learn Data Science!

In this post, I am going to be talking about some of the most important graph algorithms you should know and how to implement them using Python.

A review of consensus protocols

Algorithms for Collision Detection

An online book about collision detection using Processing.

Cataloging Tools for Data Teams

An introduction to Data Cataloging and major tools that data teams can use for data discovery

Complete Guide to Adam Optimization

The Adam optimization algorithm from definition to implementation

63 Machine Learning Algorithms — Introduction | by Priyanshu Jain | The Sta

In this article I covered 63 Algorithms of Machine Learning in easy to understand manner for business professionals.

All the ~Eigen-stuff they never thought you should know

To Infinity and…Linear Algebra?!

Improving complementary-product recommendations

New modeling approach increases accuracy of recommendations by an average of 7%.

Make kNN 300 times faster than Scikit-learn’s in 20 lines!

Using Facebook faiss library for REALLY fast kNN

5 SMOTE Techniques for Oversampling your Imbalance Data

Know your SMOTE ways to oversampled your data

What is Perspective Warping ? | OpenCV and Python

A step-by-step guide to apply perspective transformation on images

Helping robots avoid collisions

The startup Realtime Robotics, co-founded by former MIT postdoc George Konidaris, is helping robots solve the motion planning problem by giving them collision avoidance capabilities.

Leveraging Value from Postal Codes, NAICS Codes, Area Codes and Other Funky

A value is worthless unless it tells you something.

Floating-Point Formats and Deep Learning

Floating-point formats are not the most glamorous or (frankly) the important consideration when working with deep learning models: if your model isn’t working well, then your floating-point format certainly isn’t going to save you! However, past a certain point of model complexity/model size/training time, your choice of floating-point format can have a significant impact on your model training times and even performance. Here’s how the rest of this post is structured:

Z-score for anomaly detection

Small-bites data science

Writing a Production-Level Machine Learning Framework: Lessons Learned

Some of our insights from developing a PyTorch framework for training and running deep learning models …

https://blog.exxactcorp.com/autograd-the-best-machine-learning-library-youre-not-using/

Causal Inference Book | Miguel Hernan | Harvard T.H. Chan School of Public

Jamie Robins and I have written a book that provides a cohesive presentation of concepts of, and methods for, causal inference. Much of this material is currently scattered across journals in sever…

Vectorizing code matters

I come from the world of MATLAB and numerical computing, where for loops are shorn and vectors are king. During my PhD at UVM, Professor…

A Gentle Introduction to Information Entropy - MachineLearningMastery.com

Information theory is a subfield of mathematics concerned with transmitting data across a noisy channel. A cornerstone of information theory is the idea of quantifying how much information there is in a message. More generally, this can be used to quantify the information in an event and a random variable, called entropy, and is calculated using probability. Calculating information and…

How to Calculate the KL Divergence for Machine Learning - MachineLearningMastery.com

It is often desirable to quantify the difference between probability distributions for a given random variable. This occurs frequently in machine learning, when we may be interested in calculating the difference between an actual and observed probability distribution. This can be achieved using techniques from information theory, such as the Kullback-Leibler Divergence (KL divergence), or relative entropy, and the Jensen-Shannon…

Kullback–Leibler divergence

In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence[1]), denoted D KL ( P ∥ Q ) {\displaystyle D_{\text{KL}}(P\parallel Q)} , is a type of statistical distance: a measure of how one reference probability distribution P is different from a second probability distribution Q.[2][3] Mathematically, it is defined as

Latent Dirichlet Allocation: Intuition, math, implementation and visualisat

A tour of one of the most popular topic modelling techniques and a guide to implementing and visualising it using pyLDAvis

XGBoost, LightGBM, and Other Kaggle Competition Favorites

An Intuitive Explanation and Exploration

10 Hyperparameter optimization frameworks.

Tune your Machine Learning models with open-source optimization libraries

Geospatial Indexing with Quadkeys

Squaring the Earth

Part 7: Fast Pattern Searching with STUMPY

Finding Similar Subsequences for Known Patterns

Anomaly Detection using Benford’s Law

Locating fraudulent transactions with simple theory.

Advanced Ensemble Learning Techniques

Ensemble is an art and science

Machine learning for anomaly detection: Elliptic Envelope

Bite-sized data science

How to peek inside a black box model — Understand Partial Dependence Plots

In this post, we will be learning a tool to reveal the working mechanism of a black box model. But before we start, let talk about…

The Hundred-Page Machine Learning Book by Andriy Burkov

This is companion wiki of The Hundred-Page Machine Learning Book by Andriy Burkov. The book that aims at teaching machine learning in a concise yet systematic manner.

Why Does No One Use Advanced Hyperparameter Tuning? | by Liam Li | Oct, 202

Takeaways from our experience building state-of-the-art hyperparameter tuning in Determined AI’s integrated deep learning training…

Tutorial: Uncertainty estimation with CatBoost

Understanding why your model is uncertain and how to estimate the level of uncertainty

Python 3.9 New Features & How to Use Them

Python 3.9 New Feature Guide

Histogram Matching

How to generate a histogram for an image, how to equalize the histogram, and finally how to modify your image histogram to be similar to…

Hidden Markov Model (HMM) — simple explanation in high level

HMM is very powerful statistical modeling tool used in speech recognition, handwriting recognition and etc. I wanted to use them, but when…

An intuitive guide to PCA

Ideas behind Principal Component Analysis

Silhouette Method — Better than Elbow Method to find Optimal Clusters

Deep dive analysis of Silhouette Method to find optimal clusters in k-Means clustering

Handling Outliers in Clusters using Silhouette Analysis

Identify and remove outliers in each clusters from K-Means clustering

Machine Learning Enabled High-Sigma Verification Of Memory Designs

Variation-aware memory verification with brute force Monte Carlo accuracy in much less time.

Seven Must-Know Statistical Distributions and Their Simulations for Data Sc

Assumptions, relationships, simulations, and so on

DBSCAN — a density-based unsupervised algorithm for fraud detection

Bite-sized data science on fraud detection

AI 101: Intro to Evolutionary Algorithms

Polynomial Regression: The Only Introduction You’ll Need

A deep-dive into the theory and application behind this Machine Learning algorithm in Python, by a student

A Novel Approach to Feature Importance — Shapley Additive Explanations

The state-of-the-art in feature importance

New features in scikit-learn

Overview of the latest developments in version 0.23

Entropy, Cross-Entropy, and KL-Divergence Explained!

Let us try to understand the most widely used loss function — Cross-Entropy.

Part 5: Fast Approximate Matrix Profiles with STUMPY

Roughly Accurate Matrix Profiles Computed in a Fraction of the Time

The Jewel of the Matrix: A Deep Dive Into Eigenvalues & Eigenvectors

An intuitive look at the abstract concept

The Singular Value Decomposition without Algebra

Understand the Ultimate Linear Algebra concept with Geometry

Solving a Chicken and Egg Problem: Expectation-Maximization (EM)

The Intuition Behind the Popular Expectation-Maximization Algorithm with Example Code

Deep dive into ROC-AUC

deep dive into ROC-AUC

How Sklearn’s “TF-IDF” is different from the standard “TF-IDF”?

Let ‘s go see the differences and analyze step by step the approach which is taken to compute the Sklearn’s TF-IDF

Variational Gaussian Process — What To Do When Things Are Not Gaussian

Learn to use non-Gaussian distributions in Gaussian Process models, and variational inference with Gaussian quadrature to compute…

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSPro

Why can AdaGrad escape saddle point? Why is Adam usually better? In a race down different terrains, which will win?

Dimensionality Reduction in Hyperspectral Images using Python

Dimensionality Reduction Techniques for Hyperspectral Images.

Multiclass Classification with Support Vector Machines (SVM), Kernel Trick

Understanding the mathematic behind SVM + Implementation in Python via scikit-learn

Ten Eisen features that changed the way I do deep learning

How a simple `pip install eisen` will save days of work and solve (almost) all of your problems.

5 Fabulous Python Packages For Data-Science Nobody Knows About

Do you know about these packages?

Simple Guide to Choropleth Maps

Choropleth Maps using Plotly to track COVID 19 cases.

Visualizing Geospatial Data in Python

Open source tools and techniques for visualizing data on custom maps

External Redirection | LinkedIn

Comment Ranking Algorithms: Hacker News vs. YouTube vs. Reddit

A Machine Learning Algorithm Every Data Scientist Needs: Bagged Trees

Eigenfaces — Face Classification in Python

Not enough data for Deep Learning? Try Eigenfaces.

Federated Learning using PyTorch and PySyft | LearnOpenCV

A gentle introduction to federated learning using PyTorch and PySyft with the help of a real life example.

A Product Manager’s Guide to Machine Learning: Cloud Machine Learning

Why obtaining the Amazon Web Services Machine Learning — Specialty (“AWS ML”) certification is one of the best starting points to gaining…

Short technical information about Word2Vec, GloVe and Fasttext

Introduction

Contrasting contrastive loss functions

A comprehensive guide to four contrastive loss functions for contrastive learning

An Introduction to Optical Character Recognition for Beginners

Your first step towards reading text from unstructured data

Graph Theory | BFS Shortest Path Problem on a Grid

Hi all, welcome back to another post of my brand new series on Graph Theory named Graph Theory: Go Hero. I undoubtedly recommend the…

Using K-Means to detect changes in a retail store | Towards Data Science

Unsupervised techniques to identify changes in the behavior

Factor Analysis — A Complete Tutorial

Covering Eigenvalues, Factor Creation and Cronbach’s Alpha

Five Cool Python Libraries for Data Science - KDnuggets

Check out these 5 cool Python libraries that the author has come across during an NLP project, and which have made their life easier.

Insurance Risk Pricing — Tweedie Approach - Towards Data Science

An illustrative guide to estimate the pure premium using Tweedie models in GLMs and Machine Learning

Dot Product in Linear Algebra for Data Science using Python

Building up the intuition for how matrices help to solve a system of linear equations and thus regressions problems

Isolation Forest from Scratch

Implementation of Isolation forest from scratch for further understanding of the algorithm

The Power-Law Distribution

Recursive Feature Elimination (RFE) for Feature Selection in Python

Recursive Feature Elimination, or RFE for short, is a popular feature selection algorithm. RFE is popular because it is easy to configure and use and because it is effective at selecting those features (columns) in a training dataset that are more or most relevant in predicting the target variable. There are two important configuration options when using RFE: the choice…

A Simplified approach using PyCaret for Anomaly Detection

Explaining outlier detection with PyCaret library in python

What is isotonic regression?

$cover image$

Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data such that . (We assume no ties among the ‘s for simplicity.) Informally, isotonic regression looks for such that the ‘s approximate … Continue reading →

Text Mining with R: Gathering and Cleaning Data

Case study of tweets from comments on Indonesia’s biggest media

Entropy and Information Gain

Yet another tool used to make Decision Tree splits.

Deploy Machine Learning Applications using Chef

Overview of deploying a model with the Chef — A Configuration Management Tool

How to Use Polynomial Feature Transforms for Machine Learning

Often, the input features for a predictive modeling task interact in unexpected and often nonlinear ways. These interactions can be identified and modeled by a learning algorithm. Another approach is to engineer new features that expose these interactions and see if they improve model performance. Additionally, transforms like raising input variables to a power can help to better expose the…

Speeding training of decision trees

New method reduces training time by up to 99%, with no loss in accuracy.

Hierarchical Clustering: An Application to World Currencies

Do Asian currencies move in tandem? What about emerging markets in general? Are commodity currencies like AUD and CAD closely related as…

The Illustrated Guide To Classification Metrics: The Basics

An overview of the fundamentals behind measuring and comparing machine learning solutions

What, Why and How of t-SNE

Dimensionality Reduction using t-SNE in a nutshell

Categorical Feature Encoding in Python

Methods to encode categorical variables using Python

Understanding Associative Embedding

An elegant method to group predictions without labeling

Amazon’s AI tool can plan collision-free paths for 1,000 warehouse robots

Amazon researchers describe a machine learning system that plans the movements and paths of up to 1,000 mobile warehouse robots.

firmai/datagene: DataGene - Identify How Similar Datasets Are to One Anothe

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai) - firmai/datagene

Time Series Analysis: Creating Synthetic Datasets

How to create time series datasets with different patterns

Complete guide to machine learning and deep learning in retail

The stores aren’t dead yet

A picture is worth 1,000 false-positive bug reports

How and why we built a custom app for visual debugging of warehouse pick paths.

Open sourcing the AI Model Efficiency Toolkit

Qualcomm open sources the AI Model Efficiency Toolkit on GitHub, providing a simple library plugin for AI developers.

Latent Semantic Analysis: intuition, math, implementation

TL;DR — Text data suffers heavily from high-dimensionality. Latent Semantic Analysis (LSA) is a popular, dimensionality-reduction…

Spatial Autocorrelation: Close Objects Affecting Other Close Objects

Deep dive into spatial autocorrelation and their industry use cases

An Intro to Graph Theory, Centrality Measurements, and NetworkX

Graph Theory is the study of graphs which are mathematical structures used to model pairwise relations between objects. These graphs are…

mlmachine - Clean ML Experiments, Elegant EDA & Pandas Pipelines

This new Python package accelerates notebook-based machine learning experimentation

An Intuitive Explanation of Kernels in Support Vector Machine (SVM)

We will walk through a simple example with basic arithmetics to demystify the concept of kernel.

Generative vs Discriminative Probabilistic Graphical Models

A Comparison of Naive Bayes and Logistic Regression

Using Q-Learning in Numpy to teach an agent to play a game

Using q-learning for sequential decision making and therefore learning to play a simple game.

7 advanced tricks in pandas for data science

Pandas is the go-to library for data science. These are the shortcuts I use to do repetitive data science tasks faster and simpler.

5 Great New Features in Latest Scikit-learn Release - KDnuggets

From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.

Modeling in Seconds: Using PyCaret as a Tool for Data Science Fast Decision

I came across Pycaret while I was browsing on a slack for data scientists. It's a versatile library in which you can apply/evaluate/tune…

Cross Entropy, Log-Loss And Intuition Behind It

In this blog, you will get an intuition behind the use of cross-entropy and log-loss in machine learning.

How to Deploy your Machine Learning Models on Kubernetes

Deploy, scale and manage your machine learning services with Kubernetes and Terraform on GCP.

SVM and Kernel SVM

Learn about SVM or Support Vector Machine, Kernel Trick, Hyperplanes, Lagrange Multipliers using visual examples and code sections.

PyTorch BentoML Heroku: The simple stack

and how to train and deploy an ML model into production with them.

Handling imbalanced dataset in supervised learning using family of SMOTE algorithm. - DataScienceCentral.com

Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority… Read More »Handling imbalanced dataset in supervised learning using family of SMOTE algorithm.

Feature Engineering: Data scientist's Secret Sauce ! - DataScienceCentral.com

originally posted by the author on Linkedin : Link It is very tempting for data science practitioners to opt for the best known algorithms for a given problem.However It’s not the algorithm alone , which can provide the best solution ; Model built on carefully engineered and selected features can provide far better results. “Any intelligent… Read More »Feature Engineering: Data scientist's Secret Sauce !

L1 and L2 Regularization — Explained

How to control the complexity of a model

Netflix Data Science Interview Practice Problems

A walkthrough of some of Netflix’s interview questions!

Detecting Weird Data: Conformal Anomaly Detection

Weird data is important. Often in data science, the goal is to discover trends in the data. However, consider doctors looking at images of…

30 Data Science Interview Questions from FAANG Tech Giants

In-depth Interview Q&A from Facebook, Amazon, Apple, Netflix, and Google

Machine Learning in Industrial Chemicals: Process Quality Optimization

This post is the last in our series of 5 blog posts highlighting use case presentations from the 2nd Edition of Seville Machine Learning School (MLSEV). You may also check out the previous posts ab…

A brief introduction to the beauty of Information Theory

Lambdaclass's blog about distributed systems, machine learning, compilers, operating systems, security and cryptography.

Lines Detection with Hough Transform

An algorithm to find lines in images

A Deep Dive into Lane Detection with Hough Transform

A detailed step-by-step guide to build a Lane Line Detection algorithm in OpenCV.

Getting Started with Spectral Clustering - Dr. Juan Camilo Orduz

Stochastic-, Batch-, and Mini-Batch Gradient Descent Demystified

Why do we need Stochastic, Batch, and Mini Batch Gradient Descent when implementing Deep Neural Networks?

Gaussian Mixture Models(GMM)

Understanding GMM: Idea, Maths, EM algorithm & python implementation

Stacked Auto-encoder as a Recommendation System for Movie Rating Prediction

Introduction on Stacked Auto-encoder and Technical Walk-through on Model Creation using Pytorch

RecSys Series Part 5: Neural Matrix Factorization for Collaborative Filteri

Bringing Neural Architecture into Recommendations

1. Getting started — csvkit 1.0.5 documentation

5 Machine Learning Techniques for Sales Forecasting

Comparing Linear Regression, Random Forest Regression, XGBoost, LSTMs, and ARIMA Time Series Forecasting

Deep Dive into Polynomial Regression and Overfitting - DataScienceCentral.com

In this article, we show that the issue with polynomial regression is not over-fitting, but numerical precision. Even if done right, numerical precision still remains an insurmountable challenge. We focus here on step-wise polynomial regression, which is supposed to be more stable than the traditional model. In step-wise regression, we estimate one coefficient at a… Read More »Deep Dive into Polynomial Regression and Overfitting

Optimization Techniques — Simulated Annealing

A popular method for optimizing model parameters

So why the heck are they called Support Vector Machines?

In this piece, I attempt to explain the mathematical reasoning behind this ‘complex’ name.

Layered Label Propagation Algorithm

An algorithm for community finding

Visualizing Three-Dimensional Data — Heatmaps, Contours, and 3D Plots with

Plotting heatmaps, contour plots, and 3D plots with Python

tf.data: Creating data input pipelines

Are you not able to load your NumPy data into memory? Does your model have to wait for data to be loaded after each epoch? Is your Keras…

A Complete Beginners Guide to Matrix Multiplication for Data Science with P

Learn matrix multiplication for machine learning by following along with Python examples

Partial Correlation Vs. Conditional Mutual Information

Finding relationships between different variables/ features in a dataset during a data analysis task is one of the key and fundemental…

Build PyTorch Models Easily Using torchlayers

torchlayers aims to do what Keras did for TensorFlow, providing a higher-level model-building API and some handy defaults and add-ons useful for crafting PyTorch neural networks.

Pandas tips I wish I knew before

How does pivot work? What is the main pandas building block? And more …

Co-variance: An intuitive explanation!

A comprehensive but simple guide which focus more on the idea behind the formula rather than the math itself — start building the block…

Matthews Correlation Coefficient: when to use it and when to avoid it

It’s not a silver bullet metric to classification problems

t-SNE clearly explained

An intuitive explanation of t-SNE algorithm and why it’s so useful in practice.

Visualizing Gaussian Elimination

The determinant is related to the volume of a parallelepiped spanned by the vectors in a matrix lets see how.

Bayesian Inference Algorithms: MCMC and VI

Intuition and diagnostics

Comprehensive Guide on Item Based Recommendation Systems

This guide will show in detail how item based recommendation system works and how to implement it in real work environment.

Matrix Factorization as a Recommender System

An Explanation and Implementation of Matrix Factorization

Machine Learning Benchmarking: You’re Doing It Wrong

I’m not going to bury the lede: Most machine learning benchmarks are bad. And not just kinda-sorta nit-picky bad, but catastrophically and fundamentally flawed. TL;DR: Please, for the love of sta…

Lesser-known pandas tricks (2019)

5 lesser-known pandas tricks that help you be more productive

How to Use DBSCAN Effectively

A complete guide on using the most cited clustering algorithm effectively

[P] PyCM 2.6 released : Multi-class confusion matrix library in Python

https://github.com/sepandhaghighi/pycm https://www.pycm.ir custom_rounder function added #279 complement function added sparse_matrix attribute added…

Learn how to read data into a Pandas DataFrame in 5 minutes

Extract data from different sources

Boosting Showdown: Scikit-Learn vs XGBoost vs LightGBM vs CatBoost in Senti

Which boosting algorithm will reign supreme in this head-to-head competition?

Decision Trees for Classification: ID3 Algorithm Explained

This article explains the ID3 Algorithm, in details with calculations, which is one of the many Algorithms used to build Decision Trees.

Optimization — Descent Algorithms

In this post, we will see several basic optimisation algorithms that you can use in various data science problems.

Less Known but Very Useful Pandas Functions

Expedite your data analysis process

Hyperparameter Tuning with Python: Complete Step-by-Step Guide

Why and How to use with examples of Keras/XGBoost

How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok

Imagine having your Friends Working with your Local Jupyter Notebook in a Remote Machine

A Friendly Introduction to Text Clustering

All you need to know about k-means, brown clustering, tf-idf, topic models and LDA.

Local Links Run The World

Networks regulate everything from ant colonies and middle schools to epidemics and the internet. Here’s how they work.

Retail Analytics: A Novel and Intuitive way of finding Substitutes and Comp

Retail Analytics: Data Science for Retail

Test Your Skills: 26 Data Science Interview Questions & Answers

Can you answer them all?

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of...

How exactly does PCA work?

Simplest guide to PCA, ever.

A One-Stop Shop for Principal Component Analysis - Towards Data Science

At the beginning of the textbook I used for my graduate stat theory class, the authors (George Casella and Roger Berger) explained in the…

Semi-Supervised Classification of Unlabeled Data (PU Learning)

How to classify unlabeled data when all you have is just a sample of positive data

Implementing XGBoost from scratch

A step by step guide for implementing one of the most trending machine learning algorithm using numpy

The Most Useful ML Tools 2020

5 sets of tools every lazy full-stack data scientist should use

How to use Residual Plots for regression model validation?

Using residual plots to validate your regression models

An Introduction to Support Vector Regression (SVR)

Using Support Vector Machines (SVMs) for Regression

Building an Incremental Recommender System: Part II

Going above and beyond state-of-the-art with confidence!

[P] pytorch-optimizer -- collections of ready to use optimization algorithm

249 votes, 21 comments. pytorch-optimizer -- collections of ready to use optimization algorithms for PyTorch, includes: AccSGD, AdaBound, AdaMod…

Self Supervised Depth Estimation: Breaking down the ideas

Learning depth without manual annotation

The Curious Case of Kalman Filters

QR Matrix Factorization

Least Squares and Computation (with R and C++)

Convex Hull: An Innovative Approach to Gift-Wrap your Data

How to Leverage Data Visualization with Wrapping Algorithm

Exploring the fundamentals of multi-armed bandits

Multi-armed bandits are a simple but very powerful framework for algorithms that make decisions over time under uncertainty. “Introduction to Multi-Armed Bandits” by Alex Slivkins provides an accessible, textbook-like treatment of the subject.

An introduction to time series forecasting

DataRobot MLOps is helping to increase AI value by automating the deployment, optimization, and governance of machine learning applications.

How to Develop an Imbalanced Classification Model to Detect Oil Spills

Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill requires mobilizing an expensive response,…

Why Is Imbalanced Classification Difficult?

Imbalanced classification is primarily challenging as a predictive modeling task because of the severely skewed class distribution. This is the cause for poor performance with traditional machine learning models and evaluation metrics that assume a balanced class distribution. Nevertheless, there are additional properties of a classification dataset that are not only challenging for predictive modeling but also increase or compound…

Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found

By popular demand, I’ve updated this article with the latest tutorials from the past 12 months. Check it out here

What is a Markov Decision Process Anyways?

Learn about the model that is used in most reinforcement learning problems.

Accuracy vs Speed – what Data Scientists can learn from Search

Delivering accurate insights is the core function of any data scientist. Navigating the development road toward this goal can sometimes be tricky, especially when cross-collaboration is required, and these lessons learned from building a search application will help you negotiate the demands between accuracy and speed.

MCMC Methods: Metropolis-Hastings and Bayesian Inference

This article will introduce you to Markov Chain Monte Carlo (MCMC) methods, namely Metropolis-Hastings and Bayesian inference, and demonstrate how you can harness them for your next project.

Reinforcement Learning, Part 3: The Markov Decision Process

MDP in action: the next step toward solving real-life problems with RL and AI

TinyML Book

Visit the post for more.

MIT Linear Algebra, Lecture 4: A=LU Factorization

This is the fourth post in an article series about MIT's Linear Algebra course. In this post I will review lecture four on factorizing a matrix A into a product of a lower-triangular matrix L and an upper-triangular matrix U, or in other words A=LU. The lecture also shows how to find the inverse of matrix product A·B,...

Polynomial Regression from Scratch in Python

Machine learning is one of the hottest topics in computer science today. And not without a reason: it has helped us do things that couldn’t be done before like image classification, image generation and natural language processing. But all of it boils down to a really simple concept: you give the computer data and the computer then finds patterns in that data. This is called “learning” or “training”, depending on your point of view. These learnt patterns can be extrapolated to make predictions. How? That’s what we are looking at today.

Market Basket Analysis: A Tutorial

This article is about Market Basket Analysis & the Apriori algorithm that works behind it.

Beyond L2 Loss – How We Experiment with Loss Functions

Estimating expected time of arrival (ETA) is crucial to what we do at Lyft. Estimates go directly to riders and drivers using our apps, as…

Classify A Rare Event Using 5 Machine Learning Algorithms

Which algorithm works best for unbalanced data? Are there any tradeoffs?

Survey Segmentation Tutorial

Learn the basics of verifying segmentation, analyzing the data, and creating segments in this tutorial. When reviewing survey data, you will typically be handed Likert questions (e.g., on a scale of 1 to 5), and by using a few techniques, you can verify the quality of the survey and start…

40+ Modern Tutorials Covering All Aspects of Machine Learning - DataScienceCentral.com

This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019… Read More »40+ Modern Tutorials Covering All Aspects of Machine Learning

[OC] Updated version of my recent maze finding algorithm with source code

2.1K votes, 110 comments. 1.3M subscribers in the Python community. The official Python community for Reddit! Stay up to date with the latest news…

What are some fast similarity search algorithms and data structures for hig

The Data Science Interview Study Guide

Preparing for a job interview can be a full-time job, and Data Science interviews are no different. Here are 121 resources that can help you study and quiz your way to landing your dream data science job.

https://stepupanalytics.com/beginners-guide-to-statistical-cluster-analysis-in-detail-part-1/

The 5 most useful Techniques to Handle Imbalanced datasets

This post is about explaining the various techniques you can use to handle imbalanced datasets

Adversarial Validation Overview

Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.

Practical Hyperparameter Optimization

An introduction on how to fine-tune Machine and Deep Learning models using techniques such as: Random Search, Automated Hyperparameter Tuning and Artificial Neural Networks Tuning.

Comparing Apples, Oranges and Bananas - ssense-tech - Medium

Behind every recommender system lies a bevy of metrics.

[N] scikit-optimize 0.7 release

Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several…

auto-sklearn — AutoSklearn 0.6.0 documentation

Evolutionary Algorithms

Evolutionary algorithms are an unsupervised learning alternative to neural networks that rely on fitness functions instead of trained nodes for evaluation.

Independent Component Analysis

What is Independent Component Analysis (Statistics)?

Introduction To Machine Learning Deployment Using Docker and Kubernetes

“Machine learning - Clustering, Density based clustering and SOM”

Deep learning

vumaasha/Atlas: Atlas: A Dataset and Benchmark for E-commerce Clothing Prod

Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization - vumaasha/Atlas

Introduction to Stochastic Processes [pdf] 📄

Markov Chain Analysis and Simulation using Python - Towards Data Science

Solving real-world problems with probabilities

A Pirate's Guide to Accuracy, Precision, Recall, and Other Scores

Correlation Coefficients in One Picture - DataScienceCentral.com

Correlation coefficients enable to you find relationships between a wide variety of data. However, the sheer number of options can be overwhelming. This picture sums up the differences between five of the most popular correlation coefficients. Part two covers several less popular correlation coefficients. Further reading: Gamma & Coefficient & Yule’s Q Kendall’s Tau Pearson… Read More »Correlation Coefficients in One Picture

Time Series Prediction - A short introduction for pragmatists · Blog · Liip

The 5 Classification Evaluation metrics every Data Scientist must know

This post is about various evaluation metrics and how and when to use them.

Deep Learning in the Real World: Dealing with Non-Differentiable Loss Funct

Over the past few years, deep learning has been taking by storm many industries. From voice recognition to image analysis and synthesis, neural networks have turned out to be very efficient at solv…

[D] Tools/Techniques for Efficiently Sorting Image Data

I'm planning on sorting ~100,000 images to use as data for a computer vision application. With this much data, shaving a little time off of each…

Density Estimation: MLE, MAP, MOM, KDE, ECDF, Q-Q Plot, GAN

Density estimation is estimating the probability density function of the population from the sample

[R] How UMAP works -- a detailed comparison with t-SNE

23 votes, 13 comments. A recent blog post How Exactly UMAP Works provides a different perspective on explaining the UMAP dimensionality reduction…

Is Rectified Adam actually *better* than Adam? - PyImageSearch

Is the Rectified Adam (RAdam) optimizer actually better than the standard Adam optimizer? According to my 24 experiments, the answer is no, typically not (but there are cases where you do want to use it instead of Adam).

Clustering Metrics Better Than the Elbow Method

We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method.

101 Machine Learning Algorithms for Data Science | Data Science Blog

Data Science Dojo blog features the most recent, and relevant articles about data science, analytics, generative AI, large language models, machine learning, and data visualization.

The Simple Math behind 3 Decision Tree Splitting criterions

This post is about various evaluation metrics and how and when to use them.

Rules of Machine Learning: | Google for Developers

https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

Understanding UMAP

UMAP is a new dimensionality reduction technique that offers increased speed and better preservation of global structure.

Inference Results – MLPerf

MLCommons ML benchmarks help balance the benefits and risks of AI through quantitative tools that guide responsible AI development.

Research Guide: Advanced Loss Functions for Machine Learning Models - KDnuggets

This guide explores research centered on a variety of advanced loss functions for machine learning models.

150 successful machine learning models: 6 lessons learned at Booking.com

How UMAP Works — umap 0.3 documentation

What is Hierarchical Clustering?

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

How exactly Stitch Fix’s “Tinder for clothes” learns your style

Each customer has an individualized style map, laying out her feelings about peasant blouses, A-line dresses, or pencil skirts.

Understanding AdaBoost – or how to turn Weakness into Strength

Many of you might have heard of the concept “Wisdom of the Crowd”: when many people independently guess some quantity, e.g. the number of marbles in a jar glass, the average of their guesses is often pretty accurate – even though many of the guesses are totally off. The same principle is at work in … Continue reading "Understanding AdaBoost – or how to turn Weakness into Strength"

Knowledge extraction from unstructured texts | Tech Blog

There is an unreasonable amount of information that can be extracted from what people publicly say on the internet. Learn how to do it.

benedekrozemberczki/awesome-gradient-boosting-papers: A curated list of gradient boosting research papers with implementations.

A curated list of gradient boosting research papers with implementations. - GitHub - benedekrozemberczki/awesome-gradient-boosting-papers: A curated list of gradient boosting research papers with ...

Modeling the Unseen

How Instacart uses Machine Learning to spot lost demand in its fulfillment chain

Arima Model – Complete Guide to Time Series Forecasting in Python

Using ARIMA model, you can forecast a time series using the series past values. In this post, we build an optimal ARIMA model from scratch and extend it to Seasonal ARIMA (SARIMA) and SARIMAX models. You will also see how to build autoarima models in python

Synthetic data generation — a must-have skill for new data scientists

A brief rundown of packages and ideas to generate synthetic data for self-driven data science projects and deep diving into machine…

Kernel density estimation explainer

Matthew Conlen provides a short explainer of how kernel density estimation works. Nifty.

Comparison of the Text Distance Metrics | ActiveWizards: data science and engineering lab

This article discusses and compares different approaches of how to compare two text strings.

https://dhruvonmath.com/2019/04/04/kernels

How to Use t-SNE Effectively

Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading.

L

Principal component analysis: pictures, code and proofs

The code used to generate the plots for this post can be found here.

Python Data Science Handbook | Python Data Science Handbook

The Hitchhiker’s Guide to Feature Extraction

Some Tricks and Code for Kaggle and Everyday work. This post is about useful feature engineering methods and tricks that I have learned and end up using often.

What is (Gaussian) curvature?

Algorithms by Jeff Erickson

Distill — Latest articles about machine learning

Articles about Machine Learning

Jacobian matrix and determinant - Wikipedia

In vector calculus, the Jacobian matrix (/dʒəˈkoʊbiən/,[1][2][3] /dʒɪ-, jɪ-/) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables as input as the number of vector components of its output, its determinant is referred to as the Jacobian determinant. Both the matrix and (if applicable) the determinant are often referred to simply as the Jacobian in literature.[4] They are named after Carl Gustav Jacob Jacobi.

One-Shot Learning: Learning More with Less Data

The 5 Sampling Algorithms every Data Scientist need to know

This post is about some of the most common sampling techniques one can use while working with data.

Five Command Line Tools for Data Science

You can do more data science than you think from the terminal.

TF-IDF: The best content optimization tool SEOs aren’t using - Search Engin

Term frequency–inverse document frequency uncovers the specific words that top-ranking pages use to give target keywords context.

yzhao062/pyod: A Python Toolbox for Scalable Outlier Detection (Anomaly Det

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques - yzhao062/pyod

A Gentle Introduction to Noise Contrastive Estimation

Find out how to use randomness to learn your data by using Noise Contrastive Estimation with this guide that works through the particulars of its implementation.

Introduction to Genetic Algorithms

The Hitchhiker’s Guide to Feature Extraction

168 votes, 13 comments. 2.2M subscribers in the datascience community. A space for data science professionals to engage in discussions and debates on…

Designing Data-Intensive Applications (DDIA) — an O’Reilly book by Martin Kleppmann (The Wild Boar Book)

The mathematics and Intuitions of Principal Component Analysis (PCA) Using

As data scientists or Machine learning experts, we are faced with tonnes of columns of data to extract insight from, among these features…

Applied Category Theory | Mathematics | MIT OpenCourseWare

Category theory is a relatively new branch of mathematics that has transformed much of pure math research. The technical advance is that category theory provides a framework in which to organize formal systems and by which to translate between them, allowing one to transfer knowledge from one field to another. But this same organizational framework also has many compelling examples outside of pure math. In this course, we will give seven sketches on real-world applications of category theory.

Policy Gradient Algorithms

A Visual Exploration of Gaussian Processes

How to turn a collection of small building blocks into a versatile tool for solving regression problems.

The Illustrated Word2vec

Discussions: Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments) Translations: Chinese (Simplified), French, Korean, Portuguese, Russian “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death.” ~ Dune (1965) I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2). Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like Airbnb, Alibaba, Spotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines. In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

ROC Curve Explained in One Picture - DataScienceCentral.com

With a ROC curve, you’re trying to find a good model that optimizes the trade off between the False Positive Rate (FPR) and True Positive Rate (TPR). What counts here is how much area is under the curve (Area under the Curve = AuC). The ideal curve in the left image fills in 100%, which means… Read More »ROC Curve Explained in One Picture

How to Use ROC Curves and Precision-Recall Curves for Classification in Python - MachineLearningMastery.com

It can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather than predicting classes directly. This flexibility comes from the way that probabilities may be interpreted using different thresholds that allow the operator of the model to trade-off concerns in the errors made by the model, such as the number of…

The why and how of nonnegative matrix factorization

A Visual Guide to Evolution Strategies | 大トロ

A Visual Guide to Evolution Strategies

http://precisionagricultu.re/how-machine-learning-is-gradually-changing-mod

The Swiss Army Knife of Hashmaps

A while back, there was a discussion comparing the performance of using the hashbrown crate (based on Google’s SwissTable implementation1) in the Rust compiler. In the last RustFest, Amanieu was experimenting on integrating his crate into stdlib, which turned out to have some really promising results. As a result, it’s being planned to move the crate into stdlib. I insist on watching this talk when you have some free time! ↩

Four Techniques for Outlier Detection

There are many techniques to detect and optionally remove outliers from a dataset. In this blog post, we show an implementation in KNIME Analytics Platform of four of the most frequently used - traditional and novel - techniques for outlier detection.

How to visualize decision tree

Decision trees are the fundamental building block of gradient boosting machines and Random Forests(tm), probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. Unfortunately, current visualization packages are rudimentary and not immediately helpful to the novice. For example, we couldn't find a library that visualizes how decision nodes split up the feature space. So, we've created a general package (part of the animl library) for scikit-learn decision tree visualization and model interpretation.

A Machine Learning Approach to Shipping Box Design. (arXiv:1809.10210v1 [st

Having the right assortment of shipping boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it...

Cookbook — Bayesian Modelling with PyMC3

Recently I’ve started using PyMC3 for Bayesian modelling, and it’s an amazing piece of software! The API only exposes as much of heavy machinery of MCMC as you need — by which I mean, just the pm.sample() method (a.k.a., as Thomas Wiecki puts it, the Magic Inference Button™). This really frees up your mind to think about your data and model, which is really the heart and soul of data science! That being said however, I quickly realized that the water gets very deep very fast: I explored my data set, specified a hierarchical model that made sense to me, hit the Magic Inference Button™, and… uh, what now? I blinked at the angry red warnings the sampler spat out.

A Feature Selection Tool for Machine Learning in Python

Using the FeatureSelector for efficient machine learning workflows

Vertical Spotlight: Machine Learning for Manufacturing

Receiver Operating Characteristic Curves Demystified (in Python)

In this blog, I will reveal, step by step, how to plot an ROC curve using Python. After that, I will explain the characteristics of a basic ROC curve.

Attacks against machine learning — an overview

This blog post surveys the attacks techniques that target AI (Artificial Intelligence) systems and how to protect against them.

Model evaluation, model selection, and algorithm selection in machine learn

A single-PDF version of Model Evaluation parts 1-4 is available on arXiv: https://arxiv.org/abs/1811.12808

Feature Engineering with Tidyverse

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

Text Classifier Algorithms in Machine Learning – Stats and Bots

Introduction to Market Basket Analysis in Python

Using mlxtend to perform market basket analysis on online retail data set.

Set Theory Ordered Pairs and Cartesian Product with R

aaronschlegel.com is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, aaronschlegel.com has it all. We hope you find what you are searching for!

Top speed for top-k queries

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

Monte Carlo theory, methods and examples (2013)

Modern Machine Learning Algorithms: Strengths and Weaknesses

Get to know the ML landscape through this practical, concise overview of modern machine learning algorithms. Plus, we'll discuss the tradeoffs of each.

Berkeley CS189 Machine Learning: Complete Lecture Notes [pdf] 📄

40 Techniques Used by Data Scientists - Data Science Central

These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools. When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most… Read More »40 Techniques Used by Data Scientists

https://blog.statsbot.co/introduction-to-imitation-learning-32334c3b1e7a

scikit-surprise 1.0.5 : Python Package Index

An easy-to-use library for recommender systems.

Eecs227at

ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus

Since writing this post back in 2018, I have extended this to a 4-part series on causal inference: * ➡️️ Part 1: Intro to causal inference and do-calculus [https://www.inference.vc/untitled] * Part 2: Illustrating Interventions with a Toy Example [https://www.inference.vc/causal-inference-2-illustrating-interventions-in-a-toy-example/] * Part 3: Counterfactuals [https://www.inference.

Machine Learning Explained: Vectorization and matrix operations

Gaussian Processes for Machine Learning: Contents

Sequence Modeling with CTC

A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.

A Deep Dive into Monte Carlo Tree Search

Why Momentum Really Works

We often think of optimization with momentum as a ball rolling down a hill. This isn't wrong, but there is much more to the story.

Kernel Cookbook

A reference manual for creating covariance functions.

Eigenvectors and Eigenvalues explained visually

LightTag is a text annotation platform for data scientists creating AI trai

LightTag, a newly launched startup from a former NLP researcher at Citi, has built a "text annotation platform" designed to assist data scientists who

A guide to receptive field arithmetic for Convolutional Neural Networks

The receptive field is perhaps one of the most important concepts in Convolutional Neural Networks (CNNs) that deserves more attention from…

Command Line Tricks For Data Scientists

For many data scientists, data manipulation begins and ends with Pandas or the Tidyverse. In theory, there is nothing wrong with this…

How we grew from 0 to 4 million women on our fashion app, with a vertical machine learning approach

My name is Gabi (my bio), and I’m the CEO and co-founder of Chicisimo. We launched three years ago, our goal was to offer automated outfit…

Understanding Feature Engineering (Part 3) — Traditional Methods for Text D

Traditional strategies for taming unstructured, textual data

A Gentle Introduction to Concept Drift in Machine Learning - Machine Learni

Data can change over time. This can result in poor and degrading predictive performance in predictive models that assume a static relationship between input and output variables. This problem of the changing underlying relationships in the data is called concept drift in the field of machine learning. In this post, you will discover the problem of concept drift and ways…

Start With Gradient Boosting, Results from Comparing 13 Algorithms on 165 D

Which machine learning algorithm should you use? It is a central question in applied machine learning. In a recent paper by Randal Olson and others, they attempt to answer it and give you a guide for algorithms and parameters to try on your problem first, before spot checking a broader suite of algorithms. In this post, you will discover a…

Probabilistic Filters By Example: Cuckoo Filter and Bloom Filters

CatBoost vs. Light GBM vs. XGBoost - KDnuggets

Who is going to win this war of predictions and on what cost? Let’s explore.

Multiscale Methods and Machine Learning - KDnuggets

We highlight recent developments in machine learning and Deep Learning related to multiscale methods, which analyze data at a variety of scales to capture a wider range of relevant features. We give a general overview of multiscale methods, examine recent successes, and compare with similar approaches.

Logistic Regression: A Concise Technical Overview

Interested in learning the concepts behind Logistic Regression (LogR)? Looking for a concise introduction to LogR? This article is for you. Includes a Python implementation and links to an R script as well.

Hierarchical Classification – a useful approach when predicting thousands o

Traditionally, most of the multi-class classification problems (i.e. problems where you want to predict where a given sample falls into, from a set of possible results) focus on a small number of possible predictions.

Time Series for Dummies – The 3 Step Process

Time series forecasting is an easy to use, low-cost solution that can provide powerful insights. This post will walk through introduction to three fundamental steps of building a quality model.

How we grew from 0 to 4 million women on our fashion app, with a vertical m

Three years ago we launched Chicisimo, our goal was to offer automated outfit advice. Today, with over 4 million women on the app, we want to share how our data and machine learning approach helped us grow. It’s been chaotic but it is now under control.

Linear Algebra Cheat Sheet for Machine Learning

All of the Linear Algebra Operations that You Need to Use in NumPy for Machine Learning. The Python numerical computation library called NumPy provides many linear algebra functions that may be useful as a machine learning practitioner. In this tutorial, you will discover the key functions for working with vectors and matrices that you may find useful as a machine…

The Periodic Table of Data Science

This periodic table can serve as a guide to navigate the key players in the data science space. The resources in the table were chosen by looking at surveys taken from data science users, such as the 2016 Data Science Salary Survey by O'Reilly, the 201...

Recommendation System Algorithms: An Overview

This post presents an overview of the main existing recommendation system algorithms, in order for data scientists to choose the best one according a business’s limitations and requirements.

Using Self-Organizing Maps to solve the Traveling Salesman Problem

Using Self-Organizing Maps to solve the Traveling Salesman Problem The Traveling Salesman Problem is a well known challenge in Computer Science: it consists on finding the shortest route possible that traverses all cities in a given map only once. Although its simple explanation, this problem is, indeed, NP-Complete. This implies that the difficulty to solve it increases rapidly with the number of cities, and we do not know in fact a general solution that solves the problem.

One-page R: a survival guide to data science with R - DataScienceCentral.com

This article comes from Togaware. A Survival Guide to Data Science with R These draft chapters weave together a collection of tools for the data scientist—tools that are all part of the R Statistical Software Suite. Each chapter is a collection of one (or more) pages that cover particular aspects of the topic. The chapters can be… Read More »One-page R: a survival guide to data science with R

Numenta Anomaly Benchmark: A Benchmark for Streaming Anomaly Detection

Numenta created the open source Numenta Anomaly Benchmark (NAB) to test and their own anomaly detection algorithms. Learn more about how Numenta and Domino worked together to develop the NAB.

A Seven Dimensional Analysis of Hashing Methods [pdf] 📄

Topic Modeling with LDA Introduction

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

Assessing Data with Item Response Theory

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

How to Handle Imbalanced Classes in Machine Learning

Imbalanced classes put "accuracy" out of business. This is a surprisingly common problem in machine learning, and this guide shows you how to handle it.

arbox/data-science-with-ruby: Practical Data Science with Ruby based tools.

Practical Data Science with Ruby based tools.

https://news.21.co/quantifying-decentralization-e39db233c28e

Understanding Machine Learning Algorithms

Machine learning algorithms aren’t difficult to grasp if you understand the basic concepts. Here, a SAS data scientist describes the foundations for some of today’s popular algorithms.

Top-100-Data-science-interview-questions

Data science, also known as data-driven decision, is an interdisciplinery field about scientific methods, process and systems to extract knowledge from data in various forms, and take descision based on this knowledge. A data scientist should not only be evaluated only on his/her knowledge on machine learning, but he/she should also have good expertise on statistics. I will try to start from very basics of data science and then slowly move to expert level. So let’s get started.

Why you should read Nina Zumel’s 3 part series on principal components anal

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including…

The often-overlooked random forest kernel · RMarcus

Ryan Marcus, assistant professor at the University of Pennsylvania. Using machine learning to build the next generation of data systems.

Parfit — quick and powerful hyper-parameter optimization with visualization

An introduction to parfit

Inside Flipkart’s monster-cruncher: how it gleans insights from a petabyte

The 10 Statistical Techniques Data Scientists Need to Master

The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.

Why is Kullback-Leibler divergence not a distance?

The Kullback-Leibler divergence between two probability distributions is sometimes called a "distance," but it's not. Here's why.

Machine Learning: Handbag Brand and Color Detection using Deep Neural Netwo

How to Perform the Principal Component Analysis in R | Open Data Science

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

[P] A Visual Guide to Evolution Strategies

380 votes, 20 comments. 2.9M subscribers in the MachineLearning community. Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices ->…

The Arms Race to Leverage Machine Learning in Supply Chain Planning

Artificial intelligence (AI) is hot. Over $4 billion in venture capital has been invested in AI firms just in the US. But supply chain planning software companies, with their cadre of operations research Ph.Ds who have been modeling complex problems for decades, may be better poised to solve many complex business problems than the hot new Silicon Valley firms.

Machine Learning | Google Developers

11 Important Model Evaluation Techniques Everyone Should Know - Data Scienc

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate. Confidence Interval. Confidence intervals are used to assess how reliable a statistical estimate… Read More »11 Important Model Evaluation Techniques Everyone Should Know

Relative error distributions, without the heavy tail theatrics

Nina Zumel prepared an excellent article on the consequences of working with relative error distributed quantities (such as wealth, income, sales, and many more) called “Living in A Lognormal…

Be Wrong the Right Number of Times

Update, Dec 12, 2016: There is a follow up post discussing the outcome of all of this after the election results were known.

delgado14a.pdf 📄

“Shrinking bull’s-eye” algorithm speeds up complex modeling from days to ho