ESL is one of the most widely accepted introductory texts on Machine Learning. Each chapter link points to a PDF of the relevant section.

Variable Types

Least Squares & Nearest Neighbors

Decision Theory

Statistical Models

Regression Models

Estimator Classes

Model Selection & Bias-Variance

Least Squares & Nearest Neighbors

Decision Theory

Statistical Models

Regression Models

Estimator Classes

Model Selection & Bias-Variance

Least Squares

Subsets

Shrinkage

Derived Input Directions

Comparisons

Multiple Outcomes

Lasso & Related

Computational Factors

Subsets

Shrinkage

Derived Input Directions

Comparisons

Multiple Outcomes

Lasso & Related

Computational Factors

Intro

Indicator Matrix

Discriminant Analysis

Logistic Regression

Separating Hyperplanes

Indicator Matrix

Discriminant Analysis

Logistic Regression

Separating Hyperplanes

Intro

Piecewise Polynomials & Splines

Filtering & Feature Extraction

Smoothing Splines

Auto-Selection of Smoothing Parameters

Non-parametric Logistic Regression

Multi-dimensional Splines

Regularization & Reproducing Kernel Hilbert Spaces

Wavelet Smoothing

Piecewise Polynomials & Splines

Filtering & Feature Extraction

Smoothing Splines

Auto-Selection of Smoothing Parameters

Non-parametric Logistic Regression

Multi-dimensional Splines

Regularization & Reproducing Kernel Hilbert Spaces

Wavelet Smoothing

1D Smoothers

Kernel Width

Local Regression

Structured Local Regression

Local Likelihood

Kernel Density Estimation

Radial Basis Functions & Kernels

Mixture Models

Computational Factors

Kernel Width

Local Regression

Structured Local Regression

Local Likelihood

Kernel Density Estimation

Radial Basis Functions & Kernels

Mixture Models

Computational Factors

Intro

Bias, Variance, Model Complexity

Bias-Variance Decomposition

Training Error Rates & Optimism

Effective # of Parameters

Bayesian Approach & BIC

Minimum Description Length

Vapnik-Chervonenkis Dimension

Cross Validation

Bootstrap Methods

Bias, Variance, Model Complexity

Bias-Variance Decomposition

Training Error Rates & Optimism

Effective # of Parameters

Bayesian Approach & BIC

Minimum Description Length

Vapnik-Chervonenkis Dimension

Cross Validation

Bootstrap Methods

Intro

Bootstrap & Max Likelihood

Bayesian Methods

Bootstrap:Bayesian Relation

EM Algorithm

MCMC for Posterior Sampling

Bagging

Model Averaging & Stacking

Bumping

Bootstrap & Max Likelihood

Bayesian Methods

Bootstrap:Bayesian Relation

EM Algorithm

MCMC for Posterior Sampling

Bagging

Model Averaging & Stacking

Bumping

Generalized Additive Models

Tree-Based Methods

PRIM

MARS

Hierarchical Expert Mixtures

Missing Data

Computational Factors

Tree-Based Methods

PRIM

MARS

Hierarchical Expert Mixtures

Missing Data

Computational Factors

Boosting Methods

Forward Stagewise Additive Modeling

AdaBoost

Why Exponential Loss?

Loss Functions

"Off the Shelf" Procedures

Example: Spam Data

Boosting Trees

Right-Sized Trees

Regularization

Interpretation

Examples

Forward Stagewise Additive Modeling

AdaBoost

Why Exponential Loss?

Loss Functions

"Off the Shelf" Procedures

Example: Spam Data

Boosting Trees

Right-Sized Trees

Regularization

Interpretation

Examples

Intro

Projection Persuit Regression

Neural Nets

Fitting

Training Issues

Examples

Discussion

Bayesian NNs

Computational Factors

Projection Persuit Regression

Neural Nets

Fitting

Training Issues

Examples

Discussion

Bayesian NNs

Computational Factors

Intro

Support Vector Classifier

Support Vector Machines & Kernels

Generalizing Linear Discriminant Analysis

Flexible Discriminant Analysis

Penalized Discriminant Analysis

Mixture Discriminant Analysis

Support Vector Classifier

Support Vector Machines & Kernels

Generalizing Linear Discriminant Analysis

Flexible Discriminant Analysis

Penalized Discriminant Analysis

Mixture Discriminant Analysis

Intro

Prototypes (K-Means, LVQ, Gaussian Mixtures)

k-Nearest-Neighbor Classifiers

Adaptive Nearest-Neighbor Methods

Computational Factors

Prototypes (K-Means, LVQ, Gaussian Mixtures)

k-Nearest-Neighbor Classifiers

Adaptive Nearest-Neighbor Methods

Computational Factors

Intro

Association Rules

Cluster Analysis

Self-Organizing Maps

Principal Components

Non-Negative Matrix Factorization

Independent Component Analysis

Multidimensional Scaling (MDS)

Non-Linear Dimension Reduction

Google PageRank

Association Rules

Cluster Analysis

Self-Organizing Maps

Principal Components

Non-Negative Matrix Factorization

Independent Component Analysis

Multidimensional Scaling (MDS)

Non-Linear Dimension Reduction

Google PageRank

When P >> N

Diagonal LDA

Linear Classifiers - Quadratic Regularization

Linear Classifiers - L1 Regularization

Classification when Features aren't Available

High-Dimensional Regression

Feature Assessment

Diagonal LDA

Linear Classifiers - Quadratic Regularization

Linear Classifiers - L1 Regularization

Classification when Features aren't Available

High-Dimensional Regression

Feature Assessment