Source abbreviations:    AJE: Algorithms     BA: Bandit Algorithms    BJP: (me)    CI: Collective Intelligence     CO: Convex Estimation    DIDL: Dive into Deep Learning    DLG: Deep Learning (Goodfellow, et al)    DMMD: Data Mining of Massive Datasets    DSA: Data structures & Algorithms    DSCL: Data Science at the Command Line    EA: Elementary Algorithms    ESL: Elements of Statistical Learning    FDS: Foundations of Data Science    GT: Geometric Topology    ITA: Intro to Algorithms     JE: Algorithms     NP: Numeric Python     SKL: Scikit-learn     SM: ML cheatsheet     RL: Reinforcement Learning

Book chapter summaries - deep learning, machine learning, various math

   (multiple)    approximations    arithmetic    association rules    autoencoders    bandit algorithms    bash    bayes    cheatsheets    classification    clustering    combinationals    computation - complexity - performance - benchmarking    data structures    datasets    deep learning architectures    density estimation    design    dimensional reduction    dynamic programming    ensembles    evaluation    feature engineering    file I/O    gaussians    generative models    geometry    graphs    greedy algos    inference    information theory    interviewing    kernels    label spreading, label propagation    latent variables    learning    linear models    linear programming    make    markov chains    matrix math    max likelihood estimation (MLE)    methods    mixtures    monte carlo    multilabel    natural language processing    novelties-outliers    numerical analysis    numpy    pandas    parametric models    performance    planning    planning / capacity    probabilistic analysis    probability & statistics    pycaret    recommenders    recurrent NNs    recursion    regression    reinforcement learning    restricted boltzmann machines    robotics    searching & sorting    set theory    streams    strings    survival analysis    svd    svms    sympy    tbd    tensorflow    time series    tools    topology    training    use cases    vision    visualization    wavelets    
data science cheatsheet 2.0 (aaron wang)
distributions; hypothesis testing; concepts; model evaluation; linear regression; logistic regression; decision trees; naive bayes; svms; knns; clustering; dimensional reduction (PCA, LDA, FA); NLP; neural nets (basics, CNNs, RNNs); boosting; recommenders; reinforcement learning; anomoly detection

other topics (FDS)
ranking & social choice; compressed sensing & sparse vectors; use cases; an uncertainty principle; gradients; linear programming; integer optimization; semi-definite programming

approximate-inference (DLG)
inference as optimization
expectation maximization (EM)
MAP inference | sparse coding
variational inference
learned approx inference

approximations (algorithm reductions) (ADM)
algo reductions
basic hardness reductions
creative reductions
"proving" hardness
P vs NP hardness
NP-complete problems

approximations (algorithm reductions) (ITA)
the vertex-cover problem
the traveling salesman problem
the set-cover problem
randomization & linear programming
the subset-sum problem

complex-numbers (LAY)
examples; geometric representation; powers; R^2

computation (DLG)
underflow, overflow
poor conditioning
gradient-based optimization
jacobian & hessian matrices
constrained optimization
linear least squares

factoring primes (ADM)
is n a prime number? if not, what are its factors?

linear algebra (LAY)
linear equations
row reductions
vector equations
solution sets of linear systems
linear independence
linear transforms
linear models - business, science, engineering

linear equation solvers (ADM)
if A = an mxm matrix, and b = an mx1 vector, what is vector X such that AX=b?

number theory (ITA)
basics (divisors, primes/composites)
greatest common divisor (Euclid)
modular math (group theory?)
linear equations
the chinese remainder problem
RSA public-key crypto
prime testing
factorization (integer)

random numbers (ADM)
(also part of "numericals" chapter of ADM.)

association rules
association rules | market basket analysis (ESL)
frequent itemsets (DMMD)
market-basket modeling; association rules; a-priori algorithm; large datasets & main memory; limited-pass algorithms; counting items in streams

autoencoders (DLG)
undercomplete AEs; regularized AEs; representational power, layer size & depth; stochastic encoders & decoders; denoising AEs; learning manifolds with AEs; predictive sparse decomposition; applications

autoencoders with Tensorflow (HoML)
common linux/bash commands (Data Science - Command Line)
environment (alias, bash, cols, for, sudo, ...)
files & directories (body, cd, cat, chmod, ...)
pattern matching (awk, sed, grep)
deployment (aws, git, )
CSV data
JSON data
online data (curl, scp, scrape, ssh)
integer/date sequences,br> file extraction/compression (tar, tree, uniq, ...)

bayes inference (CSI)
two examples
uninformed prior distributions
flaws in frequentist inference
bayes vs frequentist comparison

bayes nets (directed graphs) (SM)
bayes statistics (NP)
intro & model definition
sampling posterior distributions
linear regression

bayesian statistics (SM)
posterior distribution
MAP estimates
bayes model selection
hierarchical bayes
empirical bayes
decision theory

deep learning cheatsheet (2018) (SCDL)
CNNs, RNNs, tips & tricks

sampling methods (PSC)
inverse transform sampling; the bootstrap; rejection sampling; importance sampling

cal housing market analysis (HoML)
classification basics (HoML)
MNIST, aka hello world
confusion matrix
metrics (precision,recall)
ROC curve
multiclass classification
multilabel classification
multioutput classification

discriminants (LDA, QDA) (SKL)
Linear DA
Quadratic DA

linear classification (ESL)
regression - indicator matrix
linear discriminant analysis (LDA)
logistic regression

logistic regression (SKL)
solvers - liblinear, newton-cg, lbfgs, sag, saga

metrics (SKL)
accuracy, top-K accuracy, balanced accuracy
cohen's kappa, confusion matrix, classification report
hamming loss, precision, recall, f-measure
precision-recall curve, avg precision
precision-recall curve (multilabel)
jaccard similarity
hinge loss
log loss
matthews correleation coefficient
confusion matrix (multilabel)
ROC curve
detection-error tradeoff (DET)
zero-one loss
brier score

multiclass & multioutput algos (SKL)
multiclass (aka label binarization)
output code
classifier chains
multiclass-multioutput (aka multitask)

multilayer perceptron (MLP) (SKL)
naive bayes (SKL)
NB classification (gaussian, multinomial, complement, bernoulli)
categorical NB

nearest neighbors (SKL)
basic algos (ball tree, KD tree, ...)
KNNs & radius-based algos
nearest centroids
neighborhood components analysis (NCA)

nearest neighbors (ESL)
prototype methods (kmeans, learning vector quant, gaussian mixtures)
knn classifiers
adaptive NN methods
computational performance

biclustering methods (SKL)
intro, spectral co/biclustering

clustering (DMMD)
intro (data, strategies, dimensionality)
CURE (clustering using representatives)
non-euclidean spaces
clustering for streams & parallelism

clustering (FDS)
k-means (lloyds algo, wards algo)
approximation stability
kernel methods
recursive clustering w/ sparse cuts
dense submatrices & communities
community finding & graph partitions
spectral clustering & social nets

clustering (ESL)
clustering methods (SKL)
Kmeans & Kmeans minibatch
Affinity propagation
Mean shifts
Spectral clustering
Agglomerative clustering
Hierarchical clustering

clustering metrics (SKL)
rand index; mutual info score; homogeneity / completeness / v-measure; Fowlkes-Mallows score; silhouette coefficient; Calinski-Harabasz index; Davies-Bouldin index

job scheduling (ADM)
given a directed acyclic graph (vertices = jobs, edges = task dependencies), what schedule completes the job in minimum time/effort?

partitions (ADM)
given integer n, generate partitions that add up to n.

permutations (ADM)
given n, generate a set of items of length n.

satisfiability (ADM)
given a set of logical constraints, is there a configuration that satisfies the set?

deep learning architectures
CNN cheatsheet (SCDL)
adversarial apps (paperswithcode)
convolutional NNs (DLG)
convolutionl NNs (DLG)
deep feedforward NNs (DLG)
deep generative models (DLG)
deep learning (DLG)
gans (DIDL)
intro (ESL)
intro to neural nets (CSI)
intro; fitting; autoencoders; deep learning; learning (dropout, input distortion)

linear NNs (DIDL)
neural network zoo (asimov institute)
perceptrons (DIDL)
representation learning (DLG)
greedy layer-wise unsupervised pretraining
transfer learning | domain adaptation
semi-supervised disentangling of causal factors
distributed representation
exponential gains from depth
providing clues to find underlying causes

structured probabilistic models (DLG)
challenges; using graphs; sampling from graphs; advantages; dependencies; infererence & approx inference

density estimation
density estimates (PSC)
density estimates
kernel density estimator (KDE)

density estimation methods (SKL)
intro, histograms, kernel density estimates (KDE)

dynamic programming
dynamic programming (ADM)
dynamic programming (ITA)
dynamic programming (JE)
intro; faster fibonacci numbers; smart recursion; greed is stupid; longest increasing subsequence; edit distance; subset sum; binary search trees; dynamic programming on trees;

file I/O
data I/O (DSCL)
local data to docker
internet downloads (curl, ...)
decompressions (zip, ...)
excel to CSV
relational DBs
web APIs
streaming APIs

file I/O (NP)
CSV; HDF5; h5py; Pytables; serialization

file I/O - datatypes (PDA)
text files; JSON; XML/HTML scraping; binary data; web APIs; databases

generative models
generative models - discrete data (SM)
generative classifiers; bayesian concept learning; beta-binomial model; dirichlet-multinomial model; naive bayes classifiers

bin packing (ADM)
given n items and m bins - store all the items using the smallest number of bins.

convex hulls (ADM)
geometric primitives (ADM)
geometry (ITA)
intersections (ADM)
line arrangements (ADM)
medial axis xforms (ADM)
minkowski sum (ADM)
motion planning (ADM)
nearest neighbors (ADM)
point location (ADM)
polygon partitions (ADM)
polygon simplification (ADM)
range search (ADM)
shape similarity (ADM)
spatial structures (DSA)
multi-dimensional structures; planar straight-line graphs; search trees; quad/octal trees; binary space partitioning trees; r-trees; spatio-temporal data; kinetic structures; online dicts; cuttings; approximate geometric queries

triangulation (ADM)
vector spaces (LAY)
basic algorithms (JE)
definitions; representations; data structures; whatever-first search; depth-first; breadth-first; best-first; disconnected graphs; directed graphs
reductions (flood fill)

chinese-postman (ADM)
given a graph, finding the shortest path touching each edge.

cliques (ADM)
how to find the largest clique (cluster) in a graph?

connected components (ADM)
find the pieces of a graph, where vertices x & y are members of different components if no path exists from x to y.

edge coloring (ADM)
what's the smallest set of colors needed to color the edges of a graph, such that no two same-color edges share a common vertex?

edge vertex connectivity (ADM)
what's the smallest subset of vertices (edges) whose deletion will disconnect a graph?

feedback edge vertex set (ADM)
flows & cuts applications (JE)
edge-disjoint paths
vertex capacities & vertex-disjoint paths
bipartite matching
tuple selection
disjoint-path covers
baseball elimination
project selection

graph algos (ITA)
representations; breadth-first search; depth-first search; topological sorting; strongly-connected components;

graph algos (SOTA) (paperswithcode)
graph datastructs (ADM)
adjancency matrices; adjancency lists

graph drawing (ADM)
graph generation (ADM)
graph isomorphism (ADM)
given two graphs G & H, find a function from G's vertices to H's vertices such that G & H are identical.

graph link analysis (DMMD)
PageRank; link spam; hubs & authorities

graph partition (ADM)
given a weighted graph G and integers k & m, partition the vertices of G into m equally-sized subsets such that the total edge cost spanning the subsets is at most k.

graph traversal (ADM)
graphs connected components (ADM)
graphs hard (ADM)
graphs polynomial time (ADM)
graphs weighted (ADM)
graphviz (tool) (graphviz)
hamiltonian cycles (ADM)
matching (ADM)
maxflow (ITA)
min spanning trees (JE)
min spanning trees (ITA)
minimum spanning tree (ADM)
network flow (ADM)
planarity detection (ADM)
random graphs (FDS)
social graphs (DMMD)
sparse matrices graphs (NP)
transitive closure (ADM)
traveling salesman (ADM)
tree drawing (ADM)
undirected graphs (ESL)
vertex coloring (ADM)
vertex cover (ADM)
after-model-selection-estimation (CSI)
accuracy after model selection
selection bias
combined bayes-frequentist estimation

inference & max likelihood (ESL)
inference frequentist (CSI)
parametric inference (PSC)
information theory
info theory tutorial (stone, USheffield)
finding a route
bits are not binary digits
entropy - continuous variables
max-entropy distributions
channel capacity
shannon's source coding theorem
noise reduces channel capacity
mutual info
shannon's noisy channel coding theorem
gaussian channels
fourier analysis
key equations

label spreading, label propagation
latent variables
linear factor models (DLG)
probabilistic PCA + factor analysis
independent component analysis
sparse coding
manifold representation of PCA

linear models
generalized linear models (SM)
(incomplete notes in orig PDF)

intro to make (DSCL)
overview|intro; running tasks; building; dependencies; summary

matrix math
basics (DIDL)
linear & matrix ops
eigen decompositions
single-variable calculus
multi-variable calculus
random variables

determinants (LAY)
eigenvectors & eigenvalues (LAY)
intro; eigenvectors & difference equations
determinants & characteristic equations
eigenvectors & linear transforms
complex eigenvalues
discrete dynamical systems
differential equations
iterative estimates

inner-product-length-orthogonality (LAW)
linear algebra overview (DLG)
scalars, vectors, matrices, tensors
vector|matrix multiplication
identity matrix
inverse matrix
linear dependence
diagonal matrix
symmetric matrix
orthogonal matrix
eigen decomposition
singular value decomposition (svd)
moore-penrose pseudoinverse matrix
trace operator
example - principal components analysis (PCA)

matrix cookbook (
complex matrices
solutions & decompositions
multivariate distributions
special matrices
functions & operators
1-D results

matrix determinants (ADM)
matrix math (LAY)
matrix multiply (ADM)
matrix ops (ITA)
numerical basics (ADM)
linear equations
bandwidth reduction
matrix multiplication
determinants & permanents
optimization (constrained, unconstrained)
linear programming
random number gen
factors & prime testing
arbitrary-precision math
the knapsack problem
discrete fourier transforms (DFTs)

symmetric matrices (LAY)
max likelihood estimation (MLE)
methodologies (paperswithcode)
representation learning; transfer learning; image classification; reinforcement learning; 2D classification; domain adaptation; data augmentation; ...

latent linear models (SM)
factor analysis
principal components analysis (PCA)
choosing number of dimensions
PCA for categories
PCA for paired & multiview data
independent component analysis (ICA)

monte carlo
monte carlo methods (DLG)
sampling; importance sampling; markov chain monte carlo (MCMC); gibbs sampling; mixing challenges

natural language processing
Gensim lessons ()
NLP SOTA (paperswithcode)
595 tasks (july2022)

natural language processing (NLP) (DIDL)
spaCy tutorial (
topic models (FDS)
topic models
non-negative matrix factorization (NMF)
hard & soft clustering
latent dirichlet allocation (LDA)
dominant admixtures
term-topic matrices
hidden markov models
graph models & belief propagation
bayes|belief nets
markov random fields
factor graphs
tree algorithms
message passing
single-cycle graphs
single-loop belief updates
max weight matching
warning propagation
variable correlation

advanced techniques (PDA)
ndarray internals
array manipulation
structured & record arrays
advanced array I/O
performance tips

basics (PDA)
numpy basics (PDSH)
arrays; boolean arrays; broadcasting; indexing; sorting; structured data; aggregations; ufuncs; data types

vectors, matrices, ndarrays (NP)
pandas basics (PDA)
series; data frames; index objects; essential functions; descriptive stats

pandas basics (PDSH)
aggregation/grouping, concat, append, hierarchical indexes, merge, join, missing values, objects, ops, performance, pivot tables, time series ops, vectorized string ops

planning algorithms (LaValle)
motion planning
decision theory
differential-constraint planning

planning / capacity
probabilistic analysis
Probabilistic Analysis and Randomized Algorithms (ITA)
Indicator random variables, Randomized algorithms, Probabilistic analysis and further uses of indicator random variables

PyCaret intro (BJP)
PyCaret is a high-level, low-code Python library that makes it easy to compare, train, evaluate, tune, and deploy machine learning models with only a few lines of code. At its core, PyCaret is basically just a large wrapper over many data science libraries such as Scikit-learn, Yellowbrick, SHAP, Optuna, and Spacy. Yes, you could use these libraries for the same tasks, but if you don’t want to write a lot of code, PyCaret could save you a lot of time.

backtracking (AJE)
backtracking (JE)
recursion (JE)
simplify & delegate
tower of hanoi
design pattern
recursion trees
linear-time selection
fast multiplication

restricted boltzmann machines
survival analysis
support vector machines (ESL)
support vector machines (SVMs) (SKL)
classification (SVC, NuSVC, LinearSVC)
multiclass SVM
scoring & metrics
weighted classes/samples
regression (SVR, NuSVR, LinearSVR)
precomputed kernels - the Gram matrix

svms (HoML)
intro (NP)
symbols; expressions; numeric evaluation; calculus (derivatives, integrals, series expansions, limits, sums & products); equation solvers; linear algebra

time series
Prophet (Facebook)
calendar math (ADM)
time series (PSC)
time series applications (SOTA) (paperswithcode)
time series ops (PDA)
date & time datatypes; ranges, frequencies & shifting; periods; frequency conversion; moving windows

hyperbolic topology (GT)
groups; spaces; manifolds; thick-thin decomposition; sphere at infinity

surfaces (GT)
intro; teichmuller spaces; surface diffeomorphisms

three-manifolds (GT)
topology; seifert manifolds; construction; the "eight geometries"; mostow rigidity problem; hyperbolic 3Ms; hyperbolic dehn filling

computer vision SOTA (paperswithcode)
1300 tasks (july2022)

developers tools (scikit-image)
edges & lines (scikit-image)
contour finding
convex hulls (binary images)
canny filters
marching cubes
ridge operators
active contour model
drawing std shapes
random shapes
hough transforms (straight line)
approximating & subdividing polygons
hough transforms (circular, elliptical)
morphological thinning
edge operations (multiple)

exposures & colors (scikit-image)
RGB-grayscale conversions
RGB-HSV conversions
histogram matching
(ex) immunohistochemical (IHC) staining
adapting grayscale filters to RGB images
regional maxima filtering (bright features)
local histogram equalization (LHE)
gamma & log-contrast adjustments
histogram equalization
tinting grayscale images

filtering & restoration (scikit-image)
image datasets (scikit-image)
longform examples (scikit-image)
numpy basic ops (scikit-image)
object detection (scikit-image)
object segmentation (scikit-image)
transforms & registration (scikit-image)