Scikit-Learn Guides - Jupyter Notebooks

(these are HTML pages, converted using nbconvert. As such, they do not support Jekyll markup schemes.)
(Edits in progress. Not final.)
Getting Started Estimator basics
Transformers & preprocessors
Model evaluation
Automatic parameter searches

Supervised Learning

Linear Models Ordinary Least Squares (OLS)
Ridge regression
Lasso regression (incl multitask Lasso)
Akeike & Bayes info criteria
Elastic Net regression
Least Angle (LARS) regression
OrthogonalMatchingPursuit (OMP)
BayesianRidge regression
General Linear Regression (GLR)
GLR with Tweedie
Stochastic Gradient Descent (SGD) regressor & classifier
Passive Aggressive algos
RANSAC, Huber, Theil-Sen robustness algos
Polynomial regression
Logistic Regression (LR) Logistic function (wikipedia)
Binary, One-vs-Rest, Multinomial options
Solvers:
liblinear
lbfgs, sag, newton-cg: l2 penalty support
sag: uses SGD
saga: l1 penalty support
Discriminant Analysis (LDA, QDA) Linear DA
Quadratic DA
Shrinkage
Estimators
Kernel Ridge Regression (KRR) example: KRR vs SVR
example: execution time
Support Vector Machines (SVMs) Classification
Classification (multiclass)
Scoring
Weights
Regression
Complexity & kernel options
Gram matrix
Stochastic Gradient Descent (SGD) Classification (std, multiclass, weighted, averaged)
Regression (std, sparse data)
Tips
Nearest Neighbors (NNs) Options
KNN vs Radius-based
Ball tree vs KD tree vs Brute Force
NearestCentroid
NeighborhoodComponentsAnalysis (NCA)
Gaussian Processes GaussianRegression (GPRs)
Gaussian vs Kernel Ridge
Cross Decomposition / Partial Least Squares (PLS) Canonical PLS
SVD PLS
PLS regression
Canonical Correlation Analysis (CCA)
Naive Bayes (NB) classifiers Gaussian NB
Multinomial NB
Complement NB
Bernoulli NB
Categorical NB
Decision Trees (DTs) DT classifier
Graphviz
DT regressor
Multiple outputs
Complexity
ID3, C5.0, CART
Impurity functions (Gini, Entropy, Misclassification, MSE, MAE)
Minimal cost-complexity pruning
Bagging Methods
Random Forest
Extra Trees
Feature importance
Random Tree Embedding
Boosting AdaBoost
Gradient Boosted DTs
Shrinkage vs Learning Rate
Subsampling
Histogram-based Gradient Boosting
Stacked Generalization
Voting Hard & soft voting classifiers
Voting regressor
Multiclass & Multioutput Algorithms Label Binarizer
One-vs-Rest classifier
Multilabel classifier
One-vs-One classifier
Output Code classifier
Multioutput classifier
Classifier Chains
Multi Output regressor
Regressor Chains
Feature Selection (FS) Variance-based
Univariate
Recursive
Model-based
Impurity-based
Sequential
FS & pipelines
Semi-Supervised Algorithms SelfTrainingClassifier
LabelSpreading
LabelPropogation
Calibration Curves Using cross validation
Performance scores
Regressors
Multiclass support
Multilayer Perceptrons (MLPs) MLP classifier
Multilabel & Multiclass classification
MLP regressor
Regularization
Tips

Unsupervised Learning

Gaussian Mixtures Expectation Maximization (EM)
Variational Bayes GM
Manifolds Isomap
Locally Linear Embedding (LLE)
Modified LLE
Hessian LLE
Local Tangent Space Alignment (LTSA)
Multi Dimensional Scaling (MDS)
Random Tree Embedding
Spectral Embedding
t-distributed Stochastic Neighbor Embedding (t-SNE)
Neighborhood Components Analysis (NCA)
Clustering Techniques K-Means
Affinity Propagation
Mean Shift
Spectral
Agglomerative
Dendrograms
DBSCAN
OPTICS
Birch
Clustering Metrics rand_score
mutual_info_score
Homogeneity, completeness & v-measure
Fowlkes-Mallows score
Silhouette coefficient
Calinski-Harabasz index
Davies-Bouldin index
Contingency matrix
Pair confusion matrix
Biclustering Spectral co-clustering
Spectral bi-clustering
metrics
Component Analysis / Matrix Factorization Principal Component Analysis (PCA)
Incremental PCA
PCA with random SVD
PCA & sparse data
Kernel PCA
Truncated SVD (aka Latent Semantic Analysis, LSA)
Dictionary Learning
Factor Analysis (FA)
Independent Component Analysis (ICA)
Non-Negative Matrix Factorization (NNMF)
Latent Dirichlet Allocation (LDA)
Covariance Empirical (observed) covariance
Shrunk covariance
Ledoit-Wolf (LW) shrinkage
Oracle approx shrikage (OAS)
Precision matrix
Min covariance determinant (MCD) estimators
Mahalanaobis distances
Novelty & Outlier Detection Intro
section One-class SVM vs Elliptic Envelope vs Isolation Forest vs Local Outlier Factor
Novelties
Outliers
Density Analysis Histograms
Kernel density estimation (KDE)

Cross Validation & Hyperparameters

Cross Validation (CV) Intro
cross_val_score
cross_validate
cross_val_predict
Kfold, stratified Kfold
Leave One Out (LOO)
Leave P Out (LPO)
CV on grouped data
Time series splits
Permutation testing
Visualizations
Hyperparameter Settings Grid search
Randomized search
Successive Halving (SH)
Alternatives to brute-force search
Info criteria (AIC,BIC) regularization

Metrics, Evaluation & Scoring

Classifier Metrics Accuracy
Top K accuracy
Balanced accuracy
Cohen's kappa
Confusion matrix
Classification report
Hamming loss
Precision, recall, F-measure
Precision-recall curve
Average precision
Jaccard similarity
Hinge loss
Log loss
Matthews correlation coefficient
Receiver operating characteristic (ROC)
Detection error tradeoff (DET)
Zero-one loss
Brier score
Multi-label Rankers Coverage error
Label ranking avg precision (LRAP)
Label ranking loss
Discounted cume gain (DCG)
Regression Metrics Explained variance
Max error
Mean absolute error (MAE)
Mean squared error (MSE)
Mean squared log error (MSLE)
Mean absolute pct error (MAPE)
R2 (coefficient of determination)
Tweedie deviance error
"Dummy" Metrics Dummy classifier
Dummy regressor
Metrics Overview make_scorer

Metrics - Visualization

Learning Curves validation_curve
learning_curve
Partial Dependence Plots (PDPs) PDP - 2D example
PDP - 3D example
Individual conditional expectation (ICE) plot example
Permutation Feature Importance (PFI) plots Tree-based models: impurity vs permutation
ROC curves Example using svc_disp
Examples Confusion matrix display
ROC curve display
Precision recall display
Composite Transformers Pipelines
Regression target transformers
Feature unions
Column transformers

Feature Engineering

Feature Extraction (Text) Bag of Words (BoW)
Count Vectorizer
TfIdf Transformer
TfIdf Vectorizer
Decoding text files
The Hashing Trick
Hashing Vectorizer
Custom vectorizers
Feature Extraction (Image Patches) extract_patches_2d
reconstruct_from_patches_3d
Connecitivity graphs
Data Preprocessing Standard scaler
MinMax scaler
MaxAbs scaler
Robust scaler
Kernel centerer
Quantile transform
Power Map
Normalizer
Ordinal encoder
One Hot encoder
K Bins discretizer (aka binning)
Polynomial feature generation
Data Imputation Simple (univariate)
Iterative (multivariate)
Nearest Neighbors
Missing Indicator
Dimensionality Reduction: Random Projections (RP) random_projection
Johnson-Lindenstrauss lemma
Gaussian RP
Sparse data RP
Kernel Approximations Nystroem approximation
RBF sampler
Additive Chi-squared sampler
Skewed Chi-squared sampler
Polynomial sampler
Pairwise Operations pairwise_distances
pairwise_kernels
Cosine similarity
Kernels: linear, polynomial, sigmoid, RBF, laplacian, chi-squared
Binarization & Encoding Label binarizer
Multi-label binarizer
Label encoding

Datasets

Simple Datasets Boston house prices (classification)
Iris (classification)
Diabetes (regression)
Digits (classification)
Linnerud (regression)
Wine (classification)
Breast cancer (classification)

fetch_olivetti_faces
fetch_20newsgroups
fetch_lfw_people (Labeled faces in the wild)
fetch_covtype (Forest covertype)
fetch_rcv1 (Reuters Newswire corpus)
fetch_kddcup99 (KDD CUP - intrusion detection)
fetch_california_housing
Artificial Data Generators (classifications)
make_blobs
make_classification
make_gaussian_quantiles
make_circles
make_moons
(multilabel classifications)
make_multilabel
make_hastie
make_biclusters
make_checkerboard
(regression)
make_regression
make_sparse_uncorrelated
make_friedman(1,2,3)
(manifolds)
make_s_curve
make_swiss_roll
(decompositions)
make_low_rank_matrix
make_sparse_coded_signal
make_spd_matrix (symmetric positive definite)
Other Example Datasets load_sample_images
fetch_openml
Other API tools - pandas, scipy, numpy, scikit-image, imageio

Performance factors

Performance / Scaling Out-of-core operations example
Performance / Latency Details Bulk vs Atomic mode
Validation overhead
#Features
Input datatypes
Feature extraction
Linear algebra - BLAS, LAPACK usage
Memory limits
Model reshaping
Performance / Parallel Ops Tools Joblib
OpenMP
NumPy/SciPy
sklearn.set_config
Persistence (File I/O) Details Pickle
Joblib dump, load