machine-learning

cover image

All you need to know about Machine Learning in a hundred pages. Supervised and unsupervised learning, support vector machines, neural networks, ensemble methods, gradient descent, cluster analysis and dimensionality reduction, autoencoders and transfer learning, feature engineering and hyperparameter tuning! Math, intuition, illustrations, all in just a hundred pages!

Neural Networks learn to predict by backpropagation. This article aims to help you, build a solid intuition about the concept using a simple example. The ideas we learn here can be expanded for bigger nerual network. I assume that you already know how feed forward neural network works. Before reading the article further, take a pen and paper. The calculation used in this article can be done in the head. But I still want you to do by hand.

cover image

After uncovering a unifying algorithm that links more than 20 common machine-learning approaches, MIT researchers organized them into a “periodic table of machine learning” that can help scientists combine elements of different methods to improve algorithms or create new ones.

cover image

In this guide, you will learn how to deploy a machine learning model as an API using FastAPI. We will create an API that predicts the species of a penguin based on its bill length and flipper length. Prerequisites Step 1: Set Up Your Environment Step 2: Prepare Your Machine Learning Model Step 3: Create […]

cover image

Support Vector Machines (SVM) are supervised learning algorithms used for classification and regression by finding optimal decision boundaries between data classes.

cover image

This article shows how to implement CatBoost in R.

cover image

Triplet loss is a machine learning function that minimizes distances between similar data points while maximizing distances between dissimilar ones.

cover image

In this article, I'll take you through a list of 50+ Data Analysis Projects you should try to learn Data Analysis.

cover image

In this article, I will introduce you to 10 little-known Python libraries every data scientist should know.

cover image

In this article, I'll take you through a list of 80+ hands-on Data Science projects you should try to learn everything in Data Science.

cover image

Survival analysis consists of statistical methods that help us understand and predict how long it takes for an event to occur.

cover image

MMC, SVC, SVM: What’s the difference?

cover image

How a Key-Value (KV) cache reduces Transformer inference time by trading memory for computation

cover image

In this article, I'll take you through a list of 50+ AI & ML projects solved & explained with Python that you should try.

cover image

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3,...

cover image

Cosine similarity - the duct tape of AI. Convenient but often misused. Let's find out how to use it better.

cover image

Dramatically Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

cover image

Popular MLOps Python tools that will make machine learning model deployment a piece of cake.

cover image

Your ultimate Paper Club Starter Kit, from your friends at the Latent Space Paper Club, where we have now read 100 papers. Also: Announcing Latent Space Paper Club LIVE! at Neurips 2024! Join us!

cover image

In today’s world, you’ve probably heard the term “Machine Learning” more than once. It’s a big topic, and if you’re new to it, all the technical words might feel confusing. Let’s start with the basics and make it easy to understand. Machine Learning, a subset of Artificial Intelligence, has emerged as a transformative force, empowering machines to learn from data and make intelligent decisions without explicit programming. At its core, machine learning algorithms seek to identify patterns within data, enabling computers to learn and adapt to new information. Think about how a child learns to recognize a cat. At first,

cover image

A combination of classical search and machine learning may be the way forward

cover image

How Airbnb leverages machine learning and reinforcement learning techniques to solve a unique information retrieval task in order to…

cover image

Stochastic gradient descent is a learning algorithm that has a number of hyperparameters. Two hyperparameters that often confuse beginners are the batch size and number of epochs. They are both integer values and seem to do the same thing. In this post, you will discover the difference between batches and epochs in stochastic gradient descent. After reading this post, you…

cover image

Learn which variables you should and should not take into account in your model.

cover image

Where can you find projects dealing with advanced ML topics? GitHub is a perfect source with its many repositories. I’ve selected ten to talk about in this article.

cover image

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “” rule Interestingly, on that one, we can play a bit, and try all choices, and do it again, on a different train/test split, library(randomForest) library(ISLR2) set.seed(123) sim = function(t){ train = sample(nrow(Boston), size = nrow(Boston)*.7) subsim = function(i){ rf.boston

cover image

A measure of correlation between discrete (categorical) variables

cover image

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “” rule Interestingly, on that one, we can play a bit, and try all choices, and do it again, … Continue reading The m=√p rule for random forests →

cover image

Experimentation is widely used at tech startups to make decisions on whether to roll out new product features, UI design changes, marketing campaigns and more, usually with the goal of improving…

cover image

Making the most out of your experiments and observational data

cover image

In this article, I'll take you through a list of guided projects to master AI & ML with Python. AI & ML Projects with Python.

cover image

Compendium of free ML reading resources.

cover image

Discover the fundamentals of contrastive learning, including key techniques like SimCLR, MoCo, and CLIP. Learn how contrastive learning improves unsupervised learning and its practical applications.

cover image

Documentation, tutorials and guides for the Gradio ecosystem..

cover image

An interpretable outlier detector based on multi-dimensional histograms.

cover image

Types of Functions > Basis functions (called derived features in machine learning) are building blocks for creating more complex functions. In other

cover image

Cosine similarity can measure the proximity between two documents by transforming words into vectors within a vector space.

cover image

Cool LLM and GenAI tech questions covering many modern concepts, including fast vector search, contextual tokens, and augmented structures

cover image

Understanding the importance of permutations in the field of explainable AI

cover image

Discussing AI Research Papers in March 2024

cover image

Imagine you're at a party separating people who love pizza (yum!) from those who...well, have...

cover image

New. Comprehensive. Extendable.

cover image

Platforms like Kickstarter and Indiegogo have not only broadened access to funding to companies that might struggle in the capital markets but have also transformed the way companies connect with consumers during product development, replacing focus groups with real customers who have a stake in the final product.  Despite crowdfunding’s many benefits, numerous campaigns still fail.  To understand why, the authorse embarked on an empirical analysis of 18,173 campaigns for physical products in the technology and design categories on Kickstarter. They found that many companies often present initial products that are so fully developed that customers don’t feel that their input will materially change the product and are reluctant to contribute as a result.

cover image

#CatBoost - state-of-the-art open-source gradient boosting library with categorical features support,

cover image

Learn graphical text analysis with NLTK

cover image

A Schur decomposition of a matrix $latex A\in\mathbb{C}^{n\times n}$ is a factorization $LATEX A = QTQ^*$, where $LATEX Q$ is unitary and $LATEX T$ is upper triangular. The diagonal entries of $LAT…

cover image

Data comes in different shapes and forms. One of those shapes and forms is known as categorical data.

cover image

This article aims to take away the entry barriers to get started with time series analysis in a hands-on tutorial using Prophet

cover image

Why is Adam the most popular optimizer in Deep Learning? Let’s understand it by diving into its math, and recreating the algorithm.

cover image

How should we choose between label, one-hot, and target encoding?

cover image

A dynamic approach to treatment personalization

cover image

Which measure of correlation should you use for your task? Learn all you need to know about Pearson and Spearman correlations.

cover image

Insanely fast and reliable smoothing and interpolation with the Whittaker-Eilers method.

cover image

This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.

cover image

Understanding the logic behind AdaBoost and implementing it using Python

cover image

This tutorial offers a bridge between the abstract mathematics of manifolds and computational practice.

cover image

In this article, I'll take you through the task of Market Basket Analysis using Python. Market Basket Analysis using Python.

cover image

Independent component analysis (ICA) is a powerful data-driven tool capable of separating linear contributions in the data

cover image

It is possible to design and deploy advanced machine learning algorithms that are essentially math-free and stats-free. People working on that are typically professional mathematicians. These algorithms are not necessarily simpler. See for instance a math-free regression technique with prediction intervals, here. Or supervised classification and alternative to t-SNE, here. Interestingly, this latter math-free machine

cover image

An alternative of logistic regression in special conditions

cover image

Or "that time I built Excel for Uber and they ditched it like a week after launch"

cover image

BookCorpus has helped train at least thirty influential language models (including Google’s BERT, OpenAI’s GPT, and Amazon’s Bort), according to HuggingFace. This is the research question that…

cover image

Learn how to compute the Pearson, Spearman and Kendall correlation coefficients by hand to evaluate the relationship between two variables

cover image

In the era of hyper-sophisticated machine learning models like ChatGPT, it is surprising how effective the classic decision tree model remains, especially when used in conjunction with other techniques, such as bagging, boosting and random forests. In this blog post we demonstrate how to build an effective decision tree model, and train this model on some sample data.

cover image

Applying Reinforcement Learning strategies to real-world use cases, especially in dynamic pricing, can reveal many surprises

cover image

Intuitive derivation of the KDE formula

cover image

Standardization, Normalization, Robust Scaling, Mean Normalization, Maximum Absolute Scaling and Vector Unit Length Scaling

cover image

Understanding the purpose and functionality of common metrics in ML packages

cover image

Learn how Self-Organizing Maps work and why they are a useful unsupervised learning algorithm

cover image

Discover the concepts of Zero-Shot, One-Shot, and Few-Shot Learning, which enable machine learning models to classify and recognize objects or patterns with a limited number of examples.

cover image

Learn to build a Polynomial Regression model to predict the values for a non-linear dataset.

cover image

How to explore geographic data with HDBSCAN, H3, graph theory, and OSM.

cover image

How a Scientist Playing Solitaire Forever Changed the Game of Statistics

cover image

This tutorial explores the LightGBM library in Python to build a classification model using the LGBMClassifier class.

cover image

Hierarchical Navigable Small World (HNSW) is a state-of-the-art algorithm used for an approximate search of nearest neighbours. Under the…

cover image

In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and…

cover image

Hierarchical Navigable Small World graphs (HNSW) is an algorithm that allows for efficient nearest neighbor search, and the Sentence…

cover image

Similarity search is a popular problem where given a query Q we need to find the most similar documents to it among all the documents D.

cover image

Learn a powerful technique to effectively compress large data

cover image

Explore how similarity information can be incorporated into hash function

cover image

Understand how to hash data and reflect its similarity by constructing random hyperplanes

cover image

Dive into combinations of LSH functions to guarantee a more reliable search

cover image

Implementing variational inference from scratch

cover image

Understand survival analysis, its use in the industry, and how to apply it in Python

cover image

How do hazards and maximum likelihood estimates predict event rankings?

cover image

An in-depth exploration of autoencoders and dimensionality reduction

cover image

Chapter I. Why you should learn about non-Euclidean ML

cover image

How to visualize decision tree models with this useful library

cover image

Let’s explore how hierarchical clustering works and how it builds clusters based on pairwise distances.

cover image

A guide to understanding support vector machines for classification: from theory to scikit-learn implementation.

cover image

Reducing the dimension of a dataset using methods such as PCA

cover image

Applying causal machine learning to trim the campaign target audience

cover image

Learn what vector search is and the metrics pertinent to decide the distance (or similarity) between objects.

cover image

Basics of anomaly detection, its use-cases, and an implementation of simple yet powerful algorithm in Python

cover image

To be successful as a Data Scientist, you’re often put in positions where you need to find groups within your data. One key business use-case is finding clusters of customers that behave similarly. And that’s a powerful skill that I’m going to help you...

cover image

Spectral clustering is a method of clustering data points based on their similarity or affinity,...

cover image

Hardware Development and the Physical Frontier

cover image

Master Sklearn pipelines for effortless and efficient machine learning. Discover the art of building, optimizing, and scaling models with ease. Level up your data preprocessing skills and supercharge your ML workflow today

cover image

Make stronger and simpler models by leveraging natural order

cover image

The Similarity Engine's use cases include item-to-item similarity for text and image modality and user-to-item personalized recommendations based on a user’s historical behavior data.

cover image

The most important LightGBM parameters, what they do, and how to tune them

cover image

Create insights from frequent patterns using market basket analysis with Python

cover image

Recursive queries are a straightforward solution to querying hierarchical trees. However, one loop in the relationship references results in a failing or never ending query when cycle detection is not used.

cover image

Dive into an end-to-end demo of a high-performance semantic search engine leveraging GPU acceleration, efficient indexing techniques, and…

cover image

These curated papers would step up your machine-learning knowledge.

cover image

Exploring the Latest Enhancements and Features of PyCaret 3.0

cover image

Understanding the most underrated trick in applied Machine Learning

cover image

Why do we use the logistic and softmax functions? Thermal physics may have an answer.

cover image

Unsupervised learning has always been fascinating to me. It is a way to learn about data without manual labeling effort and allows for the…

cover image

How to find the best performance estimation approach for time-series forecasts among 12 strategies proposed in the literature. With Python…

cover image

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It is defined in general taking the ratio of two sizes, the intersection size divided by the union size, also called intersection over union (IoU).

cover image

A quick guide on how to make clean-looking, interactive Python plots to validate your data and model

cover image

How to adjust CATE to consider costs associated with your treatments

cover image

To even better understand Gradient Boosting

cover image

The top ML Papers of the Week (Mar 6 - Mar 12)

cover image

Patterns, backed by Y Combinator, is building a platform that allows customers to piece together components to build an AI-powered app.

cover image

Use natural language to test the behavior of your ML models

cover image

A primer on the math, logic, and pragmatic application of JS Divergence — including how it is best used in drift monitoring

cover image

Covers how to choose the similarity measure when item embeddings are available

cover image

A gentle dive into this unusual feature selection technique

cover image

A more advanced clustering technique for real world data

cover image

Discover how to effectively detect multivariate outliers in machine learning with PyOD in Python. Learn to convert anomaly scores to probability confidence, choose the best outlier classifier and determine the right probability threshold for improved model accuracy.

cover image

Part 4: A comprehensive step-by-step guide to solving a linear system with LU Decomposition

cover image

Comparison of various correlation methodologies

cover image

There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

cover image

What is PageRank algorithm? How can it be used in various graph database use cases? How to use it in Memgraph? If these questions are keeping you up at night, here is a blog post that will finally put your mind at ease.

cover image

Data augmentation is a key tool in reducing overfitting, whether it's for images or text. This article compares three Auto Image Data Augmentation techniques...

cover image

Become familiar with some of the most popular Python libraries available for hyperparameter optimization.

cover image

Circular data can present unique challenges when it comes to analysis and modeling

cover image

Entropy can be thought of as the probability of seeing certain patterns in data. Here’s how it works.

cover image

Algorithms that help you shop faster and smarter

cover image

Learn the basic steps to run a Multiple Correspondence Analysis in R

cover image

Tips for taking full advantage of this machine learning package

cover image

Delve deeper into the concept of multi-armed bandits, reinforcement learning, and exploration vs. exploitation dilemma.

cover image

A cross-framework package for kernels and Gaussian processes on manifolds, graphs, and meshes

cover image

Hands-on tutorial for starting your Parquet learning

cover image

Open-source vector database built for GenAI applications. Install with pip, perform high-speed searches, and scale to tens of billions of vectors.

cover image

Python Feature Engineering Cookbook Second Edition, published by Packt - PacktPublishing/Python-Feature-Engineering-Cookbook-Second-Edition

cover image

Learn more about survival analysis (also called time-to-event analysis) and how it is used, and how to apply it by hand and in R

cover image

How you can train a model to learn and predict unseen data?

cover image

The wave of enthusiasm around generative networks feels like another Imagenet moment - a step change in what ‘AI’ can do that could generalise far beyond the cool demos. What can it create, and where are the humans in the loop?

cover image

Finding the coefficients that maximize the log-partial likelihood in Python

cover image

Today Google announced a beta release of Simple ML for Sheets, which allows users without ML experience to try ML out on their spreadsheets.

cover image

Simulated confidence regions for machine learning professionals and non-statisticians. Introducing a new concept: dual confidence region.

cover image

Top entries are in bold, and sub-entries are in italics. This dictionary is from my new book "Intuitive Machine Learning and Explainable AI", available here and used as reference material for the course with the same name (see here). These entries are cross-referenced in the book to facilitate navigation, with backlinks to the pages where

cover image

Which algorithm to choose for your data

cover image

Improve the model performance by balancing the dataset using the synthetic minority oversampling technique.

cover image

How to Choose the Best Machine Learning Technique: Comparison Table

cover image

An eigenvalue of a square matrix $LATEX A$ is a scalar $latex \lambda$ such that $latex Ax = \lambda x$ for some nonzero vector $latex x$. The vector $latex x$ is an eigenvector of $LATEX A$ and it…

cover image

Mathematical Modeling, Solution, and Visualization Using PuLP and VeRoViz

cover image

Learn how you can use topic-noise models (1/3)

cover image

How you can (and why you should) create custom transformers

cover image

Utilize Anomalib from Intel OpenVinoToolkit to benchmark, develop, and deploy deep learning based image anomaly detection

cover image

This self-published book is dated July 2020 according to Amazon. But it appears to be an ongoing project. Like many new books, the material is on GitHub, here. The most recent version, dated June 2021, is available in PDF format, here. This is not a traditional book. It feels like a repository of Python code,

cover image

Ensuring your business is proactive and risk-proof.

cover image

A simple yet highly practical feature

cover image

Logistic regression is one of the most frequently used machine learning techniques for classification. However, though seemingly simple…

cover image

An introduction to the field, its applications, and current issues

cover image

A review of popular techniques and remaining challenges

cover image

Statistics in R Series: Deviance, Log-likelihood Ratio, Pseudo R² and AIC/BIC

cover image

The algorithm scans electronic records and may reduce sepsis deaths, but widespread adoption could be a challenge.

cover image

Mix and match plots to get more information from a scatter plot

Use a cocktail of 13 modern regularization techniques! () [1/9] — Sebastian Raschka (@rasbt)

cover image

An explanation of reference categories and picking the right one

cover image

A comparison between different topic modeling strategies including practical Python examples

cover image

How to compress and fit a humongous set of vectors in memory for similarity search with asymmetric distance computation (ADC)

cover image

Learn how to build MMMs for different countries the right way

cover image

The best indexing approach for billion-sized vector datasets

cover image

Efficient vector quantization for machine learning optimizations (eps. vector quantized variational autoencoders), better than straight…

cover image

Overview of how object detection works, and where to get started

cover image

This blog post introduces seven techniques that are commonly applied in domains like intrusion detection or real-time bidding, because the datasets are often extremely imbalanced.

cover image

Complete Guideline to Find Dependencies among Categorical Variables with Chi-Square Test

cover image

Precision and Recall elaborated with sample situations

cover image

By Yanqiao Wang

cover image

Covariance, eigenvalues, variance and everything …

cover image

530 votes, 63 comments. My co-founder and I, a senior Amazon research scientist and AWS SDE respectively, launched Marqo a little over a week ago - a…

cover image

Who should read this blog: Someone who is new to linear regression. Someone who wants to understand the jargon around Linear Regression Code Repository: https://github.com/DhruvilKarani/Linear-Regression-Experiments Linear regression is generally the first step into anyone’s Data Science journey. When you hear the words Linear and Regression,  something like this pops up in your mind: X1, X2,… Read More »Linear Regression Analysis – Part 1

cover image

Introduction to key elements of ML and Autoencoders: Embedding, Clustering, and Similarity.

cover image

The post introduces one of the most popular recommendation algorithms, i.e., collaborative filtering. It focuses on building an intuitive understanding of the algorithm illustrated with the help of an example.

cover image

Let’s catch those high-dimensional outliers

cover image

Determining which promoted auction items to display in a merchandising placement is a multi-sided customer challenge that presents opportunities to both surface amazing auction inventory to buyers and help sellers boost visibility on their auction listings.

cover image

Finding the adjacency graphs for US states and Texas counties using Mathematica

cover image

this post is explaining how permutation importance works and how we can code it using ELI5

cover image

In this article, we will specifically take a look at motion detection using a webcam of a laptop or computer and will create a code script to work on our computer and see its real-time example.

cover image

Avoid post-processing the SHAP values of categorical features

cover image

Creating eye-catching graphs with Python to use instead of bar charts.

cover image

Use the recently-released Transformers model to generate JSON representations of your document data

cover image

Graph partitioning has been a long-lasting problem and has a wide range of applications. This post shares the methodology for graph…

cover image

Reduce time in your data science workflow with these libraries.

cover image

An algorithmic approach to clean up your dataset and sharpen class assignments.

cover image

In this guide, we discuss what YOLOv7 is, how the model works, and the novel model architecture changes in YOLOv7.

cover image

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30...

cover image

Capturing non-linear advertising saturation and diminishing returns without explicitly transforming media variables

cover image

In this article, we discuss the importance of linear algebra in data science and machine learning.

cover image

How to forecast with scikit-learn and XGBoost models with sktime

cover image

The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize...

cover image

Brain-inspired unsupervised machine learning through competition, cooperation and adaptation

cover image

Use linear programming to minimize the difference between required and scheduled resources

cover image

Want to know what is discriminant analysis & how does it help in analyzing data? Read this complete guide on Discriminant analysis now.

Quantifying Uncertainty in Computation.

cover image

A curated list of practical financial machine learning tools and applications. - firmai/financial-machine-learning

cover image

In this article, you’ll learn about the eigendecomposition of a matrix.

cover image

This is what makes your trained models actually usable

cover image

Articles, software, calculators, and opinions.

cover image

How we applied qualitative learning, human labeling and machine learning to iteratively develop Airbnb’s Community Support Taxonomy.

cover image

Which metric should be used to evaluate the clustering results if the ground truth labels are not available? In this post, I’m introducing…

cover image

Buyers reveal a whole range of behaviors and interests when they browse our pages, so we decided to incorporate these additional purchase intent signals into our machine learning model to improve the relevance of our recommended items.

cover image

No need to worry about getting stuck in local minima anymore

cover image

Using the Folium Package to Create Stunning Choropleths

cover image

Manual Calculation From a Confusion Matrix and the Syntax of sklearn Library

Illustrated study guides ideal for visual learners.

cover image

How to use Python libraries like Open3D, PyVista, and Vedo for neighborhood analysis of point clouds and meshes through KD-Trees/Octrees

cover image

In this article, I will take you through the task of Time Series Forecasting with ARIMA using the Python programming language.

cover image

This article will cover singular value decomposition (SVD), which is a major topic of linear algebra, data science, and machine learning.

cover image

A preview into one of the most prominent data science applications

cover image

How to select control variables for causal inference using Directed Acyclic Graphs

cover image

We’ll show how to use the DID model to estimate the effect of hurricanes on house prices

cover image

Understanding the model’s output plays a major role in business-driven projects, and Sobol can help

cover image

Reproducibility is critical for robust data science — after all, it is a science.

cover image

The curse of dimensionality comes into play when we deal with a lot of data having many dimensions or features.

cover image

Making a survival analysis can be a challenge even for experienced R users, but the good news is I’ll help you make beautiful, publication-quality survival plots in under 10-minutes. Here’s what WE are going to do: Make your first survival model an...

cover image

Introduction to the most popular performance evaluation metrics for survival analysis along with practical Python examples

cover image

Introducing dart, gblinear, and XGBoost Random Forests

cover image

Evaluating similarity of visual art from both human perceptual & quantitative judgments

cover image

I show toy implementations of Python decorator patterns that may be useful for Data Scientists.

cover image

The introduction of the intel sklearn extension. Make your Random Forest even faster than XGBoost.

cover image

Which is the best algorithm?

cover image

Six matrix factorizations dominate in numerical linear algebra and matrix analysis: for most purposes one of them is sufficient for the task at hand. We summarize them here. For each factorization …

cover image

How you can pull one of a few dozen example political, sporting, education, and other frames on-the-fly.

cover image

An initial look into the method best suited for examining time-to-event data

cover image

Focal loss is said to perform better than Cross-Entropy loss in many cases. But why Cross-Entropy loss fails, and how Focal loss addresses…

cover image

Uncovering the secret behind why breads are always conveniently placed beside butter in groceries

cover image

The Shazam music recognition application made it finally possible to put a name to that song on the radio. But how does this magical miracle actually work? In this article, Toptal Freelance Software Engineer Jovan Jovanovic sheds light on the principles of audio signal processing, fingerprinting, and recognition,...

cover image

Apply Louvain’s Algorithm in Python for Community Detection

cover image

Under the new machine learning model, buyers are recommended items that are more aligned to their shopping interests on eBay.

cover image

Learn how the SHAP library works under the hood

cover image

Intuition, Bayes, and an example

cover image

Why to learn these graph algorithms? Graph algorithms are a set of instructions that...

cover image

Use these three tools to understand the usefulness of your machine learning models

cover image

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All i...

cover image

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools—useful whether you work with Windows, macOS, or Linux.

cover image

They can help you get an appointment or order a pizza, find the best ticket deals and bring your...

cover image

How to optimize the hyperparameters of a machine learning model and how to speed up the process

cover image

Managing Machine Learning Lifecycle made easy — explained with Python examples

cover image

Easily learn to track all of your ML experiments with metrics and logs with an example project walkthrough!

cover image

Natural Language Processing with Python, Gensim, Tensorflow, Transformers

cover image

A Comprehensive Guide to SHAP and Shapley Values

cover image

As a data analyst at Microsoft, I must investigate and understand time-series data every day. Besides looking at some key performance…

cover image

How we used NeRF to embed our entire 3D object catalogue to a shared latent space, and what it means for the future of graphics

cover image

We are excited to announce TorchRec, a PyTorch domain library for Recommendation Systems. This new library provides common sparsity and parallelism primitives, enabling researchers to build state-of-the-art personalization models and deploy them in production.

cover image

A dive into fundamentals of learning representations beyond feature vectors

cover image

Machine learning is a subfield of artificial intelligence (AI) and computer science that focuses on using data and algorithms to mimic the way people learn, progressively improving its accuracy. This way, Machine Learning is one of the most interesting methods in Computer Science these days, and it'

cover image

Topic modeling can bring NLP to the next level. Here’s how.

cover image

We need to know what colors our merch is. But because downstream users include many different people and algorithms, we need to describe colors as a hierarch...

cover image

Two teams have shown how quantum approaches can solve problems faster than classical computers, bringing physics and computer science closer together.

cover image

Because Graph Analytics is the future

cover image

based on "Hands-On Machine Learning with Scikit-Learn & TensorFlow" (O'Reilly, Aurelien Geron) - bjpcjp/scikit-and-tensorflow-workbooks

cover image

I analyzed thousands of searches by people who were diagnosed with cancer. Their queries offer valuable lessons that could improve the way doctors treat patients.

cover image

There are many great boosting Python libraries for data scientists to reap the benefits of. In this article, the author discusses LightGBM benefits and how they are specific to your data science job.

cover image

Prophet (FB time series prediction package) docs to Python code. - bjpcjp/fb-prophet

cover image

Updates in progress. Jupyter workbooks will be added as time allows. - bjpcjp/scikit-learn

cover image

The decision boundary is a very important visual tool for model evaluation. See how to get it to work on complex datasets

cover image

based on "Hands-On Machine Learning with Scikit-Learn & TensorFlow" (O'Reilly, Aurelien Geron) - bjpcjp/scikit-and-tensorflow-workbooks

cover image

Easily and efficiently optimize your model’s hyperparameters with Optuna with a mini project

Thread 🧵👇🏻 — Rapid (@Rapid_API)

cover image

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow... - thoughtworks/mlops-platforms

cover image

Discussing the type of Auto Encoders

cover image

Understanding how Regularization can be useful to improve the performance of your model

cover image

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production. - eugeneyan/applied-ml

cover image

Combining the “Trend” and “Difference” Terms

cover image

The whole ML is full of dimensionality reduction and its applications. Let’s see them in action!

cover image

Should you use PyTorch vs TensorFlow in 2023? This guide walks through the major pros and cons of PyTorch vs TensorFlow, and how you can pick the right framework.

cover image

Interactive Tools for Machine Learning, Deep Learning and Math - Machine-Learning-Tokyo/Interactive_Tools

cover image

Why is it hard and what to do about it?

cover image

Master usecols, chunksize, parse_dates in pandas read_csv().

cover image

Here is my take on this cool Python library and why you should give it a try

cover image

Dimensionality reduction is a vital tool for data scientists across industries. Here is a guide to getting started with it.

cover image

Efficient matrix multiplication · GitHub

cover image

If an AI model can make decisions on the company’s behalf through products and services, that model is essentially their competitive edge.

cover image

Lessons learned from successful MLOps implementation

cover image

Powerful R libraries built by the World’s Biggest Tech Companies

cover image

How does Semi-Supervised Machine Learning work, and how to use it in Python?

cover image

In this first post in a series on how to build a complete machine learning product from scratch, I describe how to setup your project and tooling.

cover image

MedMNIST v2 is a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28 x 28 (2D) or 28 x 28 x 28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision and machine learning. Description and image from: MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification Each subset keeps the same license as that of the source dataset. Please also cite the corresponding paper of source data if you use any subset of MedMNIST.

cover image

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental...

cover image

One of the best labelling tools I have ever used.

cover image

PyTorch Lightning has opened many new possibilities in deep learning and machine learning with a high level interface that makes it quicker to work with PyTorch.

cover image

for Data Scientists and ML Engineers

cover image

Low-code Machine Learning with a Powerful Python Library

cover image

The basics of kernel methods and Radial Basis Functions

cover image

Streamlit releases v1.0 of its DataOps platform for data science apps to make it easier for data scientists to share code and components.

cover image

In this post, you will learn some cool command line tricks which can help you to speed up your day-to-day R&D.

cover image

An introduction to the Kalman and Particle Filters and their applications in fields such as Robotics and Reinforcement Learning.

cover image

Create breathtaking visuals and “see” your data

cover image

This article compiles the 38 top Python libraries for data science, data visualization & machine learning, as best determined by KDnuggets staff.

cover image

Hands-on tutorial to effectively use different Regression Algorithms

cover image

Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix...

cover image

Using “Kneedle” algorithmus detecting knees with Python package “kneed”

cover image

An in-depth guide to understanding node2vec algorithm and its hyper-parameters

cover image

The story of a decade-plus long journey toward a unified forecasting model.

cover image

A one-stop-shop for all your tokenization needs

cover image

This is part 3 of a series on bot programming originally published on the Coder One blog. Part 1:...

cover image

Why conformal prediction for uncertainty estimation can improve your predictions

cover image

Essential guide to various dimensionality reduction techniques in Python

cover image

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application.

cover image

Papers With Code highlights trending Machine Learning research and the code to implement it.

cover image

Explanation and examples of frequent itemset mining and association rule learning over relational databases in Python

cover image

Different Kinds of Correlation Coefficients in a Deeper Look

cover image

What companies can learn from employee turnover data

cover image

With Streamlit creating a deploying a web app can be very easy!

cover image

Similarity search is one of the fastest-growing domains in AI and machine learning. At its core, it is the process of matching relevant pieces of information together.

cover image

In this article, I’ll show you five ways to load data in Python. Achieving a speedup of 3 orders of magnitude.

cover image

2284 methods • 143838 papers with code.

cover image

For all their triumphs, AI systems can’t seem to generalize the concepts of “same” and “different.” Without that, researchers worry, the quest to create truly intelligent machines may be hopeless.

cover image

Combining tree-boosting with Gaussian process and mixed effects models - fabsig/GPBoost

cover image

In August, I set out to improve the machine learning ecosystem for Ruby and wasn’t sure where it would go. Over the next 5 months, I ended up...

This is a draft of a book for learning data analysis with the R language. This book emphasizes hands activities. Comments and suggestions are welcome.

cover image

Data Augmentation is one of the most important topics in Deep Computer Vision. When you train your neural network, you should do data augmentation like… ALWAYS. Otherwise, you are not using your…

cover image

A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.

cover image

Sentiment Analysis, or Opinion Mining, is a subfield of NLP (Natural Language Processing) that aims to extract attitudes, appraisals, opinions, and emotions from text. Inspired by the rapid migration…

cover image

Scroll down to see how to interpret a plot created by a great tool for comparing two classes and their corpora.

cover image

In marketing analytics, conjoint analysis is a technique used to gain specific insights about consumers’ preferences. Often derived from consumer surveys, conjoint analysis can tell us, for instance…

cover image

Evaluating object detection models is not straightforward because each image can have many objects and each object can belong to different classes. This means that we need to measure if the model…

cover image

A collection of high-impact machine learning blog posts.

cover image

We are excited to announce that this year’s NeurIPS 2021 Conference will host a first-of-its-kind competition in large scale approximate…

cover image

Creating Spotify recommendations with data science

cover image

Combining data science and econometrics for an introduction to the DeepIV framework, including a full Python code tutorial.

cover image

Word on the street is that PyTorch lightning is a much better version of normal PyTorch. But what could it possibly have that it brought such consensus in our world? Well, it helps researchers scale…

cover image

In this story, we are going to discuss an application of dynamic programming techniques to an optimization algorithm. Through the process of developing an optimal solution, we get to study a variety…

cover image

Building the raw materials for personalization at scale

cover image

Essential extensions that will boost your productivity in Jupyter Notebook.

cover image

Find out about all of the projects of Meta Open Source.

cover image

Machine-learning algorithms can quickly process thousands of hours of natural soundscapes

cover image

Modern AI systems approach tasks like recognising objects in images and predicting the 3D structure of proteins as a diligent student would prepare for an exam. By training on many example...

cover image

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any ⁠⁠ matrix. It is related to the polar decomposition.

cover image

Intuition for Unifying Theory of GLMs with Derivations in Canonical and Non-Canonical Forms

cover image

Generalized Linear Models (GLMs) play a critical role in fields including Statistics, Data Science, Machine Learning, and other computational sciences. In Part I of this Series, we provided a…

cover image

Bonus: What makes a good footballer great?

cover image

Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.

cover image

Reduce the size of your dataset while keeping as much of the variation as possible

cover image

Framework improves efficiency, accuracy of applications that search for a handful of solutions in a huge space of candidates.

cover image

Check out this Jupyter notebook!

cover image

Applications from cancer to covid-19

cover image

As Data Science continues to grow and develop, it’s only natural for new tools to emerge, especially considering the fact that data…

cover image

A leading global retailer has invested heavily in becoming one of the most competitive technology companies around. Accurate and timely demand forecasting for millions of item-by-store combinations is…

cover image

Automate your hyperparameter tuning with Sklearn Pipelines and Hyperopts for multiple models in a single python call. Let's dig into the process...

cover image

There are often times when working in Data Science where we might come across a feature that is very difficult to interpret by a computer. This is often because the dimensions of the data are much…

cover image

Jupyter notebooks are mostly known for their web-based user interface, such as JupyterLab or the Classic Notebook. They offer a great user…

cover image

Supervised Machine Learning — SVM, RANDOM FOREST, LOGISTIC REGRESSION

cover image

If you are dealing with a classification task, I recommend the modAL. As for the sequence labeling task, the AlpacaTag is the only choice for you. Active learning could decrease the number of labels…

cover image

Detailed tutorial on where to find a dataset, how to preprocess data, what model architecture and loss to use, and, finally, how to…

cover image

The geometric intuition behind determinants could change how you think about them.

cover image

Christian Zuniga, PhD

cover image

Let’s see this powerful tool of data pre-processing

PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.

Data Science, Machine Learning, AI & Analytics

cover image

Improve clustering of user-item embedding by using GMM to generate new and tighter features

cover image

The advantages and pitfalls of common distance measures

cover image

Scikit learn is *the* go to package for standard machine learning models in Python. It not only provides most of the core algorithms that…

cover image

XGBoost explained as well as gradient boosting method and HP tuning by building your own gradient boosting library for decision trees.

cover image

Successive halving completely crushes GridSearch and RandomSearch

cover image

Rice University computer scientists have demonstrated artificial intelligence (AI) software that runs on commodity processors and trains deep neural networks 15 times faster than platforms based on graphics ...

cover image

This article demonstrates the deployment of a basic Streamlit app (that simulates the Central Limit Theorem) to Heroku.

cover image

Utilize the hottest ML library for state-of-the-art performance in classification

cover image

This article demonstrates the deployment of a basic Streamlit app (that predicts the Iris’ species) to Streamlit Sharing.

cover image

Your model is a lens into your data, and shap its telescope

cover image

Using Constraint Satisfaction Problems to solve AI Planning Problems.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

cover image

A deep introduction to Quadratic Discriminant Analysis (QDA) with theory and Python implementation

Discover datasets around the world!

cover image

The three-step framework Shopify's Data Science & Engineering team built for evaluating new search algorithms.

cover image

Machine Learning Projects solved and explained for free

cover image

This example shows how quantile regression can be used to create prediction intervals. See Features in Histogram Gradient Boosting Trees for an example showcasing some other features of HistGradien...

cover image

Machine Learning from Scratch: Part 4

cover image

Creative techniques to make complex models smaller

cover image

When dealing with problems on statistics and machine learning, one of the most frequently encountered terms is covariance. While most of…

cover image

Data Augmentation is one of the most important yet underrated aspects of a machine learning system …

cover image

It is a simple yet very efficient algorithm

cover image

Machine learning-based outlier detection

cover image

Note. This is an update to article: Using R and H2O to identify product anomalies during the manufacturing process.It has some updates but also code optimization from Yana Kane-Esrig( https://www.linkedin.com/in/ykaneesrig/ ), as sh...

cover image

GPU vs CPU training speed comparison for xgboost

cover image

A library for state-of-the-art self-supervised learning from images

cover image

What you need to know as graph theory adoption continues to take off

cover image

for beginners as well as advanced users

cover image

A detailed look at differences between the two algorithms and when you should choose one over the other

cover image

an end-to-end tutorial on how to apply an emerging Data Science algorithm

cover image

Elliptic Envelope and IQR-based detection

cover image

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy. - stitchfix/mab

cover image

Gaussian Process Regression is a remarkably powerful class of machine learning algorithms. Here, we introduce them from first principles.

cover image

Introducing a spatial dimension into hierarchical clustering

cover image

Learn how AdaBoost works from a Math perspective, in a comprehensive and straight-to-the-point manner.

cover image

The quickest way to embed your models into web apps.

cover image

Multi-Armed Bandits: Part 5b

cover image

Natural Gradient Boosting for Probabilistic Prediction - stanfordmlgroup/ngboost

cover image

Train, visualize, evaluate, interpret, and deploy models with minimal code.

cover image

Instacart crunches petabytes daily to predict what will be on grocery shelves and even how long it will take to find parking

cover image

I have aggregated some of the SotA image generative models released recently, with short summaries, visualizations and comments. The overall development is summarized, and the future trends are spe…

cover image

Simple and reliable optimization with local, global, population-based and sequential techniques in numerical discrete search spaces. - SimonBlanke/Gradient-Free-Optimizers

[et_pb_section fb_built=”1″ admin_label=”Header” _builder_version=”4.12.0″ background_color=”#01012C” collapsed=”on” global_colors_info=”{}”][et_pb_row column_structure=”1_2,1_2″ _builder_version=”4.12.0″ collapsed=”on” global_colors_info=”{}”][et_pb_column type=”1_2″ _builder_version=”4.12.0″ z_index=”10″ custom_padding=”18%||||false|false” global_colors_info=”{}”][et_pb_text _builder_version=”4.14.7″ text_font=”Montserrat|800|||||||” text_text_color=”#01012C” text_font_size=”470px” text_line_height=”1em” positioning=”absolute” custom_margin=”|-30%||-10%|false|false” custom_margin_tablet=”|0%||-5%|false|false” custom_margin_phone=”|0%|||false|false” custom_margin_last_edited=”on|desktop” text_font_size_tablet=”40vw” text_font_size_phone=”40vw” text_font_size_last_edited=”on|tablet” text_text_shadow_style=”preset5″ text_text_shadow_horizontal_length=”-1.5px” text_text_shadow_vertical_length=”-1.5px” text_text_shadow_color=”#DB0EB7″ global_colors_info=”{}”] pc [/et_pb_text][et_pb_text _builder_version=”4.14.7″ header_font=”Barlow Condensed|500|||||||” header_text_color=”#FFFFFF” header_font_size=”122px” custom_margin=”||0px||false|false” header_font_size_tablet=”42px” header_font_size_phone=”26px” header_font_size_last_edited=”on|tablet” global_colors_info=”{}”] low-code machine learning [/et_pb_text][et_pb_button button_url=”https://pycaret.gitbook.io” url_new_window=”on” button_text=”GET STARTED” _builder_version=”4.14.7″ […]

Concluding this three-part series covering a step-by-step review of statistical survival analysis, we look at a detailed example implementing the Kaplan-Meier fitter based on different groups, a Log-Rank test, and Cox Regression, all with examples and shared code.

cover image

Synthetic data can be used to test new products and services, validate models, or test performances because it mimics the statistical property of production data. Today you'll find different types of structured and unstructured synthetic data.

cover image

Have you ever wondered how often do you buy certain items together? Why do you buy some items together? How likely do you purchase an item…

cover image

Ever wondered how to implement a simple baseline model for multi-class problems ? Here is one example (code included).

cover image

The error backpropagation learning algorithm is a supervised learning technique for neural networks that calculates the gradient of descent for weighting different variables.

cover image

A simple technique for boosting accuracy on ANY model you use

The data science and artificial intelligence terms you need while reading the latest research

cover image

A comprehensive guide on standard generative graph approaches with implementation in NetworkX

Computer Science, Machine Learning, Programming, Art, Mathematics, Philosophy, and Short Fiction

cover image

Data labeling is often the biggest bottleneck in machine learning. Active learning lets you train machine learning models with much less labeled data. The best AI-driven companies, like Tesla, use active learning.

With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.

cover image

How to identify and segregate specific blobs in your image

cover image

Browse 1109 deep learning methods for General.

cover image

By making the first progress on the “chromatic number of the plane” problem in over 60 years, an anti-aging pundit has achieved mathematical immortality.

cover image

A complete explanation of the inner workings of Support Vector Machines (SVM) and Radial Basis Function (RBF) kernel

cover image

Strip charts are extremely useful to make heads or tails from dozens (and up to several hundred) of time series over very long periods of…

cover image

Using Mutual Information to measure the likelihood of candidate links in a graph.

cover image

Microsoft Excel is a powerful tool for learning the basics of data science and machine learning.

cover image

Jason Mayes Senior Creative Engineer, Google Machine Learning 101 Feel free to share this deck with others who are learning! Send me feedback here. Dec 2017 Welcome! If you are reading the notes there are a few extra snippets down here from time to time. But more for my own thoughts, feel free to...

cover image

Why is Model Compression important?  A significant problem in the arms race to produce more accurate models is complexity, which leads to…

cover image

An Overview of the Most Important Features in Version 0.24

cover image

This article shows a comparison of the implementations that result from using binary, Gray, and one-hot encodings to implement state machines in an FPGA. These encodings are often evaluated and applied by the synthesis and implementation tools, so it’s important to know why the software makes these decisions.

cover image

A tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.

cover image

Stochastic gradient descent optimisation algorithms you should know for deep learning

This website is for sale! benchmarkfcns.xyz is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, benchmarkfcns.xyz has it all. We hope you find what you are searching for!

cover image

Making the move from Docker to Live Deployments

cover image

This is the next installment in the "Practical Computer Science" series, where you will learn how to apply classic computer science concepts to solve real problems using Ruby. Today we are going to talk about Graph

cover image

Demystifying the inner workings of BFGS optimization

cover image

In this story, we’re going to take an aerial tour of optimization with Lagrange multipliers. When do we need them? Whenever we have an…

cover image

All the encodings that are worth knowing — from OrdinalEncoder to CatBoostEncoder — explained and coded from scratch in Python

cover image

Learn PSO algorithm as a bedtime story with GIFs and python code

cover image

**Zero-shot learning (ZSL)** is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning. Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning. Benchmark datasets for zero-shot learning include [aPY](/dataset/apy), [AwA](/dataset/awa2-1), and [CUB](/dataset/cub-200-2011), among others. ( Image credit: [Prototypical Networks for Few shot Learning in PyTorch ](https://github.com/orobix/Prototypical-Networks-for-Few-shot-Learning-PyTorch) ) Further readings: - [Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly](https://paperswithcode.com/paper/zero-shot-learning-a-comprehensive-evaluation) - [Zero-Shot Learning in Modern NLP](https://joeddav.github.io/blog/2020/05/29/ZSL.html) - [Zero-Shot Learning for Text Classification](https://amitness.com/2020/05/zero-shot-text-classification/)

cover image

**Few-Shot Learning** is an example of meta-learning, where a learner is trained on several related tasks, during the meta-training phase, so that it can generalize well to unseen (but related) tasks with just few examples, during the meta-testing phase. An effective approach to the Few-Shot Learning problem is to learn a common representation for various tasks and train task specific classifiers on top of this representation. Source: [Penalty Method for Inversion-Free Deep Bilevel Optimization ](https://arxiv.org/abs/1911.03432)

cover image

Quantifying the effects of varying different inputs, applied on a gemstone dataset with over 50K round-cut diamonds

cover image

A Step-by-Step Guide to Host your Models!

cover image

Part one of a series on how we will measure discrepancies in Airbnb guest acceptance rates using anonymized perceived demographic data.

cover image

The sometimes confusing concepts involved in interpreting coronavirus testing

cover image

An intuitive visual explanation

cover image

A Log-Normal Distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.

cover image

There are many better alternatives

cover image

A simple introduction to matching in bipartite graphs with Python code examples

cover image

Scientists have developed an Artificial Intelligence (AI) system that recognises hand gestures by combining skin-like electronics with computer vision.

cover image

How to use convex hulls in data clustering

cover image

In the Deep Learning (DL) age, more and more people have encountered and used (knowingly or not) random matrices. Most of the time this…

cover image

A quick introduction to 10 basic graph algorithms with examples and visualisations

cover image

Cheat Sheets for Machine Learning and Data Science

cover image

Peregrine: A Pattern-Aware Graph Mining System.

cover image

Learn which of the 9 most prominent automatic speech recognition engines is best for your needs, and how to use it in Python programs.

cover image

“Less than one”-shot learning can teach a model to identify more objects than the number of examples it is trained on.

cover image

Multi-Armed Bandits: Part 6

cover image

Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. Statistical-based feature selection methods involve evaluating the relationship between each input variable and the…

cover image

In this post, I am going to be talking about some of the most important graph algorithms you should know and how to implement them using Python.

cover image

An online book about collision detection using Processing.

cover image

An introduction to Data Cataloging and major tools that data teams can use for data discovery

cover image

The Adam optimization algorithm from definition to implementation

cover image

In this article I covered 63 Algorithms of Machine Learning in easy to understand manner for business professionals.

cover image

To Infinity and…Linear Algebra?!

cover image

New modeling approach increases accuracy of recommendations by an average of 7%.

cover image

Using Facebook faiss library for REALLY fast kNN

cover image

Know your SMOTE ways to oversampled your data

cover image

A step-by-step guide to apply perspective transformation on images

cover image

The startup Realtime Robotics, co-founded by former MIT postdoc George Konidaris, is helping robots solve the motion planning problem by giving them collision avoidance capabilities.

cover image

A value is worthless unless it tells you something.

cover image

Floating-point formats are not the most glamorous or (frankly) the important consideration when working with deep learning models: if your model isn’t working well, then your floating-point format certainly isn’t going to save you! However, past a certain point of model complexity/model size/training time, your choice of floating-point format can have a significant impact on your model training times and even performance. Here’s how the rest of this post is structured:

cover image

Small-bites data science

cover image

Some of our insights from developing a PyTorch framework for training and running deep learning models …

cover image

Jamie Robins and I have written a book that provides a cohesive presentation of concepts of, and methods for, causal inference. Much of this material is currently scattered across journals in sever…

cover image

I come from the world of MATLAB and numerical computing, where for loops are shorn and vectors are king. During my PhD at UVM, Professor…

cover image

Information theory is a subfield of mathematics concerned with transmitting data across a noisy channel. A cornerstone of information theory is the idea of quantifying how much information there is in a message. More generally, this can be used to quantify the information in an event and a random variable, called entropy, and is calculated using probability. Calculating information and…

cover image

It is often desirable to quantify the difference between probability distributions for a given random variable. This occurs frequently in machine learning, when we may be interested in calculating the difference between an actual and observed probability distribution. This can be achieved using techniques from information theory, such as the Kullback-Leibler Divergence (KL divergence), or relative entropy, and the Jensen-Shannon…

In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence[1]), denoted D KL ( P ∥ Q ) {\displaystyle D_{\text{KL}}(P\parallel Q)} , is a type of statistical distance: a measure of how one reference probability distribution P is different from a second probability distribution Q.[2][3] Mathematically, it is defined as

cover image

A tour of one of the most popular topic modelling techniques and a guide to implementing and visualising it using pyLDAvis

cover image

An Intuitive Explanation and Exploration

cover image

Tune your Machine Learning models with open-source optimization libraries

cover image

Finding Similar Subsequences for Known Patterns

cover image

Locating fraudulent transactions with simple theory.

cover image

Ensemble is an art and science

cover image

In this post, we will be learning a tool to reveal the working mechanism of a black box model. But before we start, let talk about…

cover image

This is companion wiki of The Hundred-Page Machine Learning Book by Andriy Burkov. The book that aims at teaching machine learning in a concise yet systematic manner.

cover image

Takeaways from our experience building state-of-the-art hyperparameter tuning in Determined AI’s integrated deep learning training…

cover image

Understanding why your model is uncertain and how to estimate the level of uncertainty

cover image

Python 3.9 New Feature Guide

cover image

How to generate a histogram for an image, how to equalize the histogram, and finally how to modify your image histogram to be similar to…

cover image

HMM is very powerful statistical modeling tool used in speech recognition, handwriting recognition and etc. I wanted to use them, but when…

cover image

Ideas behind Principal Component Analysis

cover image

Deep dive analysis of Silhouette Method to find optimal clusters in k-Means clustering

cover image

Identify and remove outliers in each clusters from K-Means clustering

cover image

Variation-aware memory verification with brute force Monte Carlo accuracy in much less time.

cover image

Assumptions, relationships, simulations, and so on

cover image

A deep-dive into the theory and application behind this Machine Learning algorithm in Python, by a student

cover image

Overview of the latest developments in version 0.23

cover image

Let us try to understand the most widely used loss function — Cross-Entropy.

cover image

Roughly Accurate Matrix Profiles Computed in a Fraction of the Time

cover image

Understand the Ultimate Linear Algebra concept with Geometry

cover image

The Intuition Behind the Popular Expectation-Maximization Algorithm with Example Code

cover image

deep dive into ROC-AUC

cover image

Let ‘s go see the differences and analyze step by step the approach which is taken to compute the Sklearn’s TF-IDF

cover image

Learn to use non-Gaussian distributions in Gaussian Process models, and variational inference with Gaussian quadrature to compute…

cover image

Why can AdaGrad escape saddle point? Why is Adam usually better? In a race down different terrains, which will win?

cover image

Dimensionality Reduction Techniques for Hyperspectral Images.

cover image

Understanding the mathematic behind SVM + Implementation in Python via scikit-learn

cover image

How a simple `pip install eisen` will save days of work and solve (almost) all of your problems.

cover image

Choropleth Maps using Plotly to track COVID 19 cases.

cover image

Open source tools and techniques for visualizing data on custom maps

cover image

Not enough data for Deep Learning? Try Eigenfaces.

cover image

A gentle introduction to federated learning using PyTorch and PySyft with the help of a real life example.

cover image

Why obtaining the Amazon Web Services Machine Learning — Specialty (“AWS ML”) certification is one of the best starting points to gaining…

cover image

A comprehensive guide to four contrastive loss functions for contrastive learning

cover image

Your first step towards reading text from unstructured data

cover image

Hi all, welcome back to another post of my brand new series on Graph Theory named Graph Theory: Go Hero. I undoubtedly recommend the…

cover image

Unsupervised techniques to identify changes in the behavior

cover image

Covering Eigenvalues, Factor Creation and Cronbach’s Alpha

Check out these 5 cool Python libraries that the author has come across during an NLP project, and which have made their life easier.

cover image

An illustrative guide to estimate the pure premium using Tweedie models in GLMs and Machine Learning

cover image

Building up the intuition for how matrices help to solve a system of linear equations and thus regressions problems

cover image

Implementation of Isolation forest from scratch for further understanding of the algorithm

cover image

Recursive Feature Elimination, or RFE for short, is a popular feature selection algorithm. RFE is popular because it is easy to configure and use and because it is effective at selecting those features (columns) in a training dataset that are more or most relevant in predicting the target variable. There are two important configuration options when using RFE: the choice…

cover image

Explaining outlier detection with PyCaret library in python

cover image

Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data such that . (We assume no ties among the ‘s for simplicity.) Informally, isotonic regression looks for such that the ‘s approximate … Continue reading →

cover image

Case study of tweets from comments on Indonesia’s biggest media

cover image

Yet another tool used to make Decision Tree splits.

cover image

Overview of deploying a model with the Chef — A Configuration Management Tool

cover image

Often, the input features for a predictive modeling task interact in unexpected and often nonlinear ways. These interactions can be identified and modeled by a learning algorithm. Another approach is to engineer new features that expose these interactions and see if they improve model performance. Additionally, transforms like raising input variables to a power can help to better expose the…

cover image

New method reduces training time by up to 99%, with no loss in accuracy.

cover image

Do Asian currencies move in tandem? What about emerging markets in general? Are commodity currencies like AUD and CAD closely related as…

cover image

An overview of the fundamentals behind measuring and comparing machine learning solutions

cover image

Dimensionality Reduction using t-SNE in a nutshell

cover image

Methods to encode categorical variables using Python

cover image

An elegant method to group predictions without labeling

cover image

Amazon researchers describe a machine learning system that plans the movements and paths of up to 1,000 mobile warehouse robots.

cover image

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai) - firmai/datagene

cover image

How to create time series datasets with different patterns

cover image

How and why we built a custom app for visual debugging of warehouse pick paths.

cover image

Qualcomm open sources the AI Model Efficiency Toolkit on GitHub, providing a simple library plugin for AI developers.

cover image

TL;DR — Text data suffers heavily from high-dimensionality. Latent Semantic Analysis (LSA) is a popular, dimensionality-reduction…

cover image

Deep dive into spatial autocorrelation and their industry use cases

cover image

Graph Theory is the study of graphs which are mathematical structures used to model pairwise relations between objects. These graphs are…

cover image

This new Python package accelerates notebook-based machine learning experimentation

cover image

We will walk through a simple example with basic arithmetics to demystify the concept of kernel.

cover image

A Comparison of Naive Bayes and Logistic Regression

cover image

Using q-learning for sequential decision making and therefore learning to play a simple game.

cover image

Pandas is the go-to library for data science. These are the shortcuts I use to do repetitive data science tasks faster and simpler.

cover image

From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.

cover image

I came across Pycaret while I was browsing on a slack for data scientists. It's a versatile library in which you can apply/evaluate/tune…

cover image

In this blog, you will get an intuition behind the use of cross-entropy and log-loss in machine learning.

cover image

Deploy, scale and manage your machine learning services with Kubernetes and Terraform on GCP.

cover image

Learn about SVM or Support Vector Machine, Kernel Trick, Hyperplanes, Lagrange Multipliers using visual examples and code sections.

cover image

and how to train and deploy an ML model into production with them.

cover image

Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority… Read More »Handling imbalanced dataset in supervised learning using family of SMOTE algorithm.

cover image

originally posted by the author on Linkedin : Link It is very tempting for  data science practitioners to opt for the best known  algorithms for a given problem.However It’s not the algorithm alone , which can provide the best solution  ; Model built on carefully engineered and selected features can provide far better results. “Any intelligent… Read More »Feature Engineering: Data scientist's Secret Sauce !

cover image

How to control the complexity of a model

cover image

A walkthrough of some of Netflix’s interview questions!

cover image

Weird data is important. Often in data science, the goal is to discover trends in the data. However, consider doctors looking at images of…

cover image

In-depth Interview Q&A from Facebook, Amazon, Apple, Netflix, and Google

cover image

This post is the last in our series of 5 blog posts highlighting use case presentations from the 2nd Edition of Seville Machine Learning School (MLSEV). You may also check out the previous posts ab…

Lambdaclass's blog about distributed systems, machine learning, compilers, operating systems, security and cryptography.

cover image

An algorithm to find lines in images

cover image

A detailed step-by-step guide to build a Lane Line Detection algorithm in OpenCV.

cover image

Why do we need Stochastic, Batch, and Mini Batch Gradient Descent when implementing Deep Neural Networks?

cover image

Understanding GMM: Idea, Maths, EM algorithm & python implementation

cover image

Introduction on Stacked Auto-encoder and Technical Walk-through on Model Creation using Pytorch

cover image

Bringing Neural Architecture into Recommendations

cover image

Comparing Linear Regression, Random Forest Regression, XGBoost, LSTMs, and ARIMA Time Series Forecasting

cover image

In this article, we show that the issue with polynomial regression is not over-fitting, but numerical precision. Even if done right, numerical precision still remains an insurmountable challenge. We focus here on step-wise polynomial regression, which is supposed to be more stable than the traditional model. In step-wise regression, we estimate one coefficient at a… Read More »Deep Dive into Polynomial Regression and Overfitting

cover image

A popular method for optimizing model parameters

cover image

In this piece, I attempt to explain the mathematical reasoning behind this ‘complex’ name.

cover image

An algorithm for community finding

cover image

Plotting heatmaps, contour plots, and 3D plots with Python

cover image

Are you not able to load your NumPy data into memory? Does your model have to wait for data to be loaded after each epoch? Is your Keras…

cover image

Learn matrix multiplication for machine learning by following along with Python examples

cover image

Finding relationships between different variables/ features in a dataset during a data analysis task is one of the key and fundemental…

cover image

torchlayers aims to do what Keras did for TensorFlow, providing a higher-level model-building API and some handy defaults and add-ons useful for crafting PyTorch neural networks.

cover image

How does pivot work? What is the main pandas building block? And more …

cover image

A comprehensive but simple guide which focus more on the idea behind the formula rather than the math itself — start building the block…

cover image

It’s not a silver bullet metric to classification problems

cover image

An intuitive explanation of t-SNE algorithm and why it’s so useful in practice.

cover image

The determinant is related to the volume of a parallelepiped spanned by the vectors in a matrix lets see how.

cover image

Intuition and diagnostics

cover image

This guide will show in detail how item based recommendation system works and how to implement it in real work environment.

cover image

An Explanation and Implementation of Matrix Factorization

cover image

I’m not going to bury the lede: Most machine learning benchmarks are bad.  And not just kinda-sorta nit-picky bad, but catastrophically and fundamentally flawed.  TL;DR: Please, for the love of sta…

cover image

5 lesser-known pandas tricks that help you be more productive

cover image

A complete guide on using the most cited clustering algorithm effectively

cover image

https://github.com/sepandhaghighi/pycm https://www.pycm.ir custom_rounder function added #279 complement function added sparse_matrix attribute added…

cover image

Which boosting algorithm will reign supreme in this head-to-head competition?

cover image

This article explains the ID3 Algorithm, in details with calculations, which is one of the many Algorithms used to build Decision Trees.

cover image

In this post, we will see several basic optimisation algorithms that you can use in various data science problems.

cover image

Expedite your data analysis process

cover image

Why and How to use with examples of Keras/XGBoost

cover image

Imagine having your Friends Working with your Local Jupyter Notebook in a Remote Machine

cover image

All you need to know about k-means, brown clustering, tf-idf, topic models and LDA.

cover image

Networks regulate everything from ant colonies and middle schools to epidemics and the internet. Here’s how they work.

cover image

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of...

cover image

Simplest guide to PCA, ever.

cover image

At the beginning of the textbook I used for my graduate stat theory class, the authors (George Casella and Roger Berger) explained in the…

cover image

How to classify unlabeled data when all you have is just a sample of positive data

cover image

A step by step guide for implementing one of the most trending machine learning algorithm using numpy

cover image

5 sets of tools every lazy full-stack data scientist should use

cover image

Using residual plots to validate your regression models

cover image

Using Support Vector Machines (SVMs) for Regression

cover image

Going above and beyond state-of-the-art with confidence!

cover image

249 votes, 21 comments. pytorch-optimizer -- collections of ready to use optimization algorithms for PyTorch, includes: AccSGD, AdaBound, AdaMod…

cover image

Learning depth without manual annotation

cover image

Least Squares and Computation (with R and C++)

cover image

How to Leverage Data Visualization with Wrapping Algorithm

cover image

Multi-armed bandits are a simple but very powerful framework for algorithms that make decisions over time under uncertainty. “Introduction to Multi-Armed Bandits” by Alex Slivkins provides an accessible, textbook-like treatment of the subject.

cover image

DataRobot MLOps is helping to increase AI value by automating the deployment, optimization, and governance of machine learning applications.

cover image

Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill requires mobilizing an expensive response,…

cover image

Imbalanced classification is primarily challenging as a predictive modeling task because of the severely skewed class distribution. This is the cause for poor performance with traditional machine learning models and evaluation metrics that assume a balanced class distribution. Nevertheless, there are additional properties of a classification dataset that are not only challenging for predictive modeling but also increase or compound…

cover image

By popular demand, I’ve updated this article with the latest tutorials from the past 12 months. Check it out here

cover image

Learn about the model that is used in most reinforcement learning problems.

Delivering accurate insights is the core function of any data scientist. Navigating the development road toward this goal can sometimes be tricky, especially when cross-collaboration is required, and these lessons learned from building a search application will help you negotiate the demands between accuracy and speed.

cover image

This article will introduce you to Markov Chain Monte Carlo (MCMC) methods, namely Metropolis-Hastings and Bayesian inference, and demonstrate how you can harness them for your next project.

cover image

MDP in action: the next step toward solving real-life problems with RL and AI

cover image

Visit the post for more.

cover image

This is the fourth post in an article series about MIT's Linear Algebra course. In this post I will review lecture four on factorizing a matrix A into a product of a lower-triangular matrix L and an upper-triangular matrix U, or in other words A=LU. The lecture also shows how to find the inverse of matrix product A·B,...

Machine learning is one of the hottest topics in computer science today. And not without a reason: it has helped us do things that couldn’t be done before like image classification, image generation and natural language processing. But all of it boils down to a really simple concept: you give the computer data and the computer then finds patterns in that data. This is called “learning” or “training”, depending on your point of view. These learnt patterns can be extrapolated to make predictions. How? That’s what we are looking at today.

This article is about Market Basket Analysis & the Apriori algorithm that works behind it.

cover image

Estimating expected time of arrival (ETA) is crucial to what we do at Lyft. Estimates go directly to riders and drivers using our apps, as…

Which algorithm works best for unbalanced data? Are there any tradeoffs?

Learn the basics of verifying segmentation, analyzing the data, and creating segments in this tutorial. When reviewing survey data, you will typically be handed Likert questions (e.g., on a scale of 1 to 5), and by using a few techniques, you can verify the quality of the survey and start…

cover image

This list of lists contains books, notebooks, presentations, cheat sheets, and tutorials covering all aspects of data science, machine learning, deep learning, statistics, math, and more, with most documents featuring Python or R code and numerous illustrations or case studies. All this material is available for free, and consists of content mostly created in 2019… Read More »40+ Modern Tutorials Covering All Aspects of Machine Learning

cover image

2.1K votes, 110 comments. 1.3M subscribers in the Python community. The official Python community for Reddit! Stay up to date with the latest news…

Preparing for a job interview can be a full-time job, and Data Science interviews are no different. Here are 121 resources that can help you study and quiz your way to landing your dream data science job.

cover image

This post is about explaining the various techniques you can use to handle imbalanced datasets

Learn how to implement adversarial validation that builds a classifier to determine if your data is from the training or testing sets. If you can do this, then your data has issues, and your adversarial validation model can help you diagnose the problem.

An introduction on how to fine-tune Machine and Deep Learning models using techniques such as: Random Search, Automated Hyperparameter Tuning and Artificial Neural Networks Tuning.

cover image

Behind every recommender system lies a bevy of metrics.

cover image

Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several…

cover image

Evolutionary algorithms are an unsupervised learning alternative to neural networks that rely on fitness functions instead of trained nodes for evaluation.

cover image

What is Independent Component Analysis (Statistics)?

cover image

Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization - vumaasha/Atlas

cover image

Solving real-world problems with probabilities

cover image

Correlation coefficients enable to you find relationships between a wide variety of data. However, the sheer number of options can be overwhelming. This picture sums up the differences between five of the most popular correlation coefficients. Part two covers several less popular correlation coefficients. Further reading: Gamma & Coefficient & Yule’s Q Kendall’s Tau Pearson… Read More »Correlation Coefficients in One Picture

cover image

This post is about various evaluation metrics and how and when to use them.

cover image

Over the past few years, deep learning has been taking by storm many industries. From voice recognition to image analysis and synthesis, neural networks have turned out to be very efficient at solv…

cover image

I'm planning on sorting ~100,000 images to use as data for a computer vision application. With this much data, shaving a little time off of each…

cover image

Density estimation is estimating the probability density function of the population from the sample

cover image

23 votes, 13 comments. A recent blog post How Exactly UMAP Works provides a different perspective on explaining the UMAP dimensionality reduction…

cover image

Is the Rectified Adam (RAdam) optimizer actually better than the standard Adam optimizer? According to my 24 experiments, the answer is no, typically not (but there are cases where you do want to use it instead of Adam).

We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method.

Data Science Dojo blog features the most recent, and relevant articles about data science, analytics, generative AI, large language models, machine learning, and data visualization.

cover image

This post is about various evaluation metrics and how and when to use them.

cover image

UMAP is a new dimensionality reduction technique that offers increased speed and better preservation of global structure.

cover image

MLCommons ML benchmarks help balance the benefits and risks of AI through quantitative tools that guide responsible AI development.

This guide explores research centered on a variety of advanced loss functions for machine learning models.

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

cover image

Each customer has an individualized style map, laying out her feelings about peasant blouses, A-line dresses, or pencil skirts.

cover image

Many of you might have heard of the concept “Wisdom of the Crowd”: when many people independently guess some quantity, e.g. the number of marbles in a jar glass, the average of their guesses is often pretty accurate – even though many of the guesses are totally off. The same principle is at work in … Continue reading "Understanding AdaBoost – or how to turn Weakness into Strength"

cover image

There is an unreasonable amount of information that can be extracted from what people publicly say on the internet. Learn how to do it.

cover image

A curated list of gradient boosting research papers with implementations. - GitHub - benedekrozemberczki/awesome-gradient-boosting-papers: A curated list of gradient boosting research papers with ...

cover image

How Instacart uses Machine Learning to spot lost demand in its fulfillment chain

cover image

Using ARIMA model, you can forecast a time series using the series past values. In this post, we build an optimal ARIMA model from scratch and extend it to Seasonal ARIMA (SARIMA) and SARIMAX models. You will also see how to build autoarima models in python

cover image

A brief rundown of packages and ideas to generate synthetic data for self-driven data science projects and deep diving into machine…

cover image

Matthew Conlen provides a short explainer of how kernel density estimation works. Nifty.

cover image

This article discusses and compares different approaches of how to compare two text strings.

cover image

Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading.

L

The code used to generate the plots for this post can be found here.

cover image

Some Tricks and Code for Kaggle and Everyday work. This post is about useful feature engineering methods and tricks that I have learned and end up using often.

cover image

Articles about Machine Learning

In vector calculus, the Jacobian matrix (/dʒəˈkoʊbiən/,[1][2][3] /dʒɪ-, jɪ-/) of a vector-valued function of several variables is the matrix of all its first-order partial derivatives. When this matrix is square, that is, when the function takes the same number of variables as input as the number of vector components of its output, its determinant is referred to as the Jacobian determinant. Both the matrix and (if applicable) the determinant are often referred to simply as the Jacobian in literature.[4] They are named after Carl Gustav Jacob Jacobi.

cover image

This post is about some of the most common sampling techniques one can use while working with data.

You can do more data science than you think from the terminal.

cover image

Term frequency–inverse document frequency uncovers the specific words that top-ranking pages use to give target keywords context.

cover image

A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques - yzhao062/pyod

Find out how to use randomness to learn your data by using Noise Contrastive Estimation with this guide that works through the particulars of its implementation.

cover image

168 votes, 13 comments. 2.2M subscribers in the datascience community. A space for data science professionals to engage in discussions and debates on…

cover image

As data scientists or Machine learning experts, we are faced with tonnes of columns of data to extract insight from, among these features…

cover image

Category theory is a relatively new branch of mathematics that has transformed much of pure math research. The technical advance is that category theory provides a framework in which to organize formal systems and by which to translate between them, allowing one to transfer knowledge from one field to another. But this same organizational framework also has many compelling examples outside of pure math. In this course, we will give seven sketches on real-world applications of category theory.

cover image

How to turn a collection of small building blocks into a versatile tool for solving regression problems.

Discussions: Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments) Translations: Chinese (Simplified), French, Korean, Portuguese, Russian “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern of its leaves. We try to copy these patterns in our lives and our society, seeking the rhythms, the dances, the forms that comfort. Yet, it is possible to see peril in the finding of ultimate perfection. It is clear that the ultimate pattern contains it own fixity. In such perfection, all things move toward death.” ~ Dune (1965) I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2). Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like Airbnb, Alibaba, Spotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines. In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

cover image

With a ROC curve, you’re trying to find a good model that optimizes the trade off between the False Positive Rate (FPR) and True Positive Rate (TPR).  What counts here is how much area is under the curve (Area under the Curve = AuC). The ideal curve in the left image fills in 100%, which means… Read More »ROC Curve Explained in One Picture

cover image

It can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather than predicting classes directly. This flexibility comes from the way that probabilities may be interpreted using different thresholds that allow the operator of the model to trade-off concerns in the errors made by the model, such as the number of…

cover image

A Visual Guide to Evolution Strategies

A while back, there was a discussion comparing the performance of using the hashbrown crate (based on Google’s SwissTable implementation1) in the Rust compiler. In the last RustFest, Amanieu was experimenting on integrating his crate into stdlib, which turned out to have some really promising results. As a result, it’s being planned to move the crate into stdlib. I insist on watching this talk when you have some free time! ↩

There are many techniques to detect and optionally remove outliers from a dataset. In this blog post, we show an implementation in KNIME Analytics Platform of four of the most frequently used - traditional and novel - techniques for outlier detection.

cover image

Decision trees are the fundamental building block of gradient boosting machines and Random Forests(tm), probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. Unfortunately, current visualization packages are rudimentary and not immediately helpful to the novice. For example, we couldn't find a library that visualizes how decision nodes split up the feature space. So, we've created a general package (part of the animl library) for scikit-learn decision tree visualization and model interpretation.

cover image

Having the right assortment of shipping boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it...

cover image

Recently I’ve started using PyMC3 for Bayesian modelling, and it’s an amazing piece of software! The API only exposes as much of heavy machinery of MCMC as you need — by which I mean, just the pm.sample() method (a.k.a., as Thomas Wiecki puts it, the Magic Inference Button™). This really frees up your mind to think about your data and model, which is really the heart and soul of data science! That being said however, I quickly realized that the water gets very deep very fast: I explored my data set, specified a hierarchical model that made sense to me, hit the Magic Inference Button™, and… uh, what now? I blinked at the angry red warnings the sampler spat out.

cover image

Using the FeatureSelector for efficient machine learning workflows

In this blog, I will reveal, step by step, how to plot an ROC curve using Python. After that, I will explain the characteristics of a basic ROC curve.

cover image

This blog post surveys the attacks techniques that target AI (Artificial Intelligence) systems and how to protect against them.

A single-PDF version of Model Evaluation parts 1-4 is available on arXiv: https://arxiv.org/abs/1811.12808

cover image

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

cover image

Using mlxtend to perform market basket analysis on online retail data set.

aaronschlegel.com is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, aaronschlegel.com has it all. We hope you find what you are searching for!

cover image

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

cover image

Get to know the ML landscape through this practical, concise overview of modern machine learning algorithms. Plus, we'll discuss the tradeoffs of each.

cover image

These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools. When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most… Read More »40 Techniques Used by Data Scientists

cover image

An easy-to-use library for recommender systems.

cover image

Since writing this post back in 2018, I have extended this to a 4-part series on causal inference: * ➡️️ Part 1: Intro to causal inference and do-calculus [https://www.inference.vc/untitled] * Part 2: Illustrating Interventions with a Toy Example [https://www.inference.vc/causal-inference-2-illustrating-interventions-in-a-toy-example/] * Part 3: Counterfactuals [https://www.inference.

cover image

A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.

cover image

We often think of optimization with momentum as a ball rolling down a hill. This isn't wrong, but there is much more to the story.

A reference manual for creating covariance functions.

cover image

LightTag, a newly launched startup from a former NLP researcher at Citi, has built a "text annotation platform" designed to assist data scientists who

cover image

The receptive field is perhaps one of the most important concepts in Convolutional Neural Networks (CNNs) that deserves more attention from…

cover image

For many data scientists, data manipulation begins and ends with Pandas or the Tidyverse. In theory, there is nothing wrong with this…

cover image

My name is Gabi (my bio), and I’m the CEO and co-founder of Chicisimo. We launched three years ago, our goal was to offer automated outfit…

cover image

Traditional strategies for taming unstructured, textual data

cover image

Data can change over time. This can result in poor and degrading predictive performance in predictive models that assume a static relationship between input and output variables. This problem of the changing underlying relationships in the data is called concept drift in the field of machine learning. In this post, you will discover the problem of concept drift and ways…

cover image

Which machine learning algorithm should you use? It is a central question in applied machine learning. In a recent paper by Randal Olson and others, they attempt to answer it and give you a guide for algorithms and parameters to try on your problem first, before spot checking a broader suite of algorithms. In this post, you will discover a…

Who is going to win this war of predictions and on what cost? Let’s explore.

We highlight recent developments in machine learning and Deep Learning related to multiscale methods, which analyze data at a variety of scales to capture a wider range of relevant features. We give a general overview of multiscale methods, examine recent successes, and compare with similar approaches.

Interested in learning the concepts behind Logistic Regression (LogR)? Looking for a concise introduction to LogR? This article is for you. Includes a Python implementation and links to an R script as well.

cover image

Traditionally, most of the multi-class classification problems (i.e. problems where you want to predict where a given sample falls into, from a set of possible results) focus on a small number of possible predictions.

Time series forecasting is an easy to use, low-cost solution that can provide powerful insights. This post will walk through introduction to three fundamental steps of building a quality model.

cover image

Three years ago we launched Chicisimo, our goal was to offer automated outfit advice. Today, with over 4 million women on the app, we want to share how our data and machine learning approach helped us grow. It’s been chaotic but it is now under control.

cover image

All of the Linear Algebra Operations that You Need to Use in NumPy for Machine Learning. The Python numerical computation library called NumPy provides many linear algebra functions that may be useful as a machine learning practitioner. In this tutorial, you will discover the key functions for working with vectors and matrices that you may find useful as a machine…

cover image

This periodic table can serve as a guide to navigate the key players in the data science space. The resources in the table were chosen by looking at surveys taken from data science users, such as the 2016 Data Science Salary Survey by O'Reilly, the 201...

This post presents an overview of the main existing recommendation system algorithms, in order for data scientists to choose the best one according a business’s limitations and requirements.

Using Self-Organizing Maps to solve the Traveling Salesman Problem The Traveling Salesman Problem is a well known challenge in Computer Science: it consists on finding the shortest route possible that traverses all cities in a given map only once. Although its simple explanation, this problem is, indeed, NP-Complete. This implies that the difficulty to solve it increases rapidly with the number of cities, and we do not know in fact a general solution that solves the problem.

cover image

This article comes from Togaware. A Survival Guide to Data Science with R These draft chapters weave together a collection of tools for the data scientist—tools that are all part of the R Statistical Software Suite. Each chapter is a  collection of one (or more) pages that cover particular aspects of the topic. The chapters can be… Read More »One-page R: a survival guide to data science with R

cover image

Numenta created the open source Numenta Anomaly Benchmark (NAB) to test and their own anomaly detection algorithms. Learn more about how Numenta and Domino worked together to develop the NAB.

cover image

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

cover image

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

cover image

Imbalanced classes put "accuracy" out of business. This is a surprisingly common problem in machine learning, and this guide shows you how to handle it.

Machine learning algorithms aren’t difficult to grasp if you understand the basic concepts. Here, a SAS data scientist describes the foundations for some of today’s popular algorithms.

Data science, also known as data-driven decision, is an interdisciplinery field about scientific methods, process and systems to extract knowledge from data in various forms, and take descision based on this knowledge. A data scientist should not only be evaluated only on his/her knowledge on machine learning, but he/she should also have good expertise on statistics. I will try to start from very basics of data science and then slowly move to expert level. So let’s get started.

cover image

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including…

Ryan Marcus, assistant professor at the University of Pennsylvania. Using machine learning to build the next generation of data systems.

The author presents 10 statistical techniques which a data scientist needs to master. Build up your toolbox of data science tools by having a look at this great overview post.

cover image

The Kullback-Leibler divergence between two probability distributions is sometimes called a "distance," but it's not. Here's why.

cover image

Stay up-to-date on the latest data science and AI news in the worlds of artificial intelligence, machine learning, deep learning, implementation, and more.

cover image

380 votes, 20 comments. 2.9M subscribers in the MachineLearning community. Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices ->…

cover image

Artificial intelligence (AI) is hot.  Over $4 billion in venture capital has been invested in AI firms just in the US. But supply chain planning software companies, with their cadre of operations research Ph.Ds who have been modeling complex problems for decades, may be better poised to solve many complex business problems than the hot new Silicon Valley firms.

cover image

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate.  Confidence Interval. Confidence intervals are used to assess how reliable a statistical estimate… Read More »11 Important Model Evaluation Techniques Everyone Should Know

cover image

Nina Zumel prepared an excellent article on the consequences of working with relative error distributed quantities (such as wealth, income, sales, and many more) called “Living in A Lognormal…

cover image

Update, Dec 12, 2016: There is a follow up post discussing the outcome of all of this after the election results were known.

cover image

A new “shrinking bull’s-eye” algorithm from researchers at MIT speeds up complex modeling from days to hours.