ESL is one of the most widely accepted introductory texts on Machine Learning. Let's just leave it at that. Each chapter link points to a PDF of the relevant book's section.

Least Squares & Nearest Neighbors

Decision Theory

Statistical Models

Regression Models

Estimator Classes

Model Selection & Bias-Variance

Subsets

Shrinkage

Derived Input Directions

Comparisons

Multiple Outcomes

Lasso & Related

Computational Factors

Indicator Matrix

Discriminant Analysis

Logistic Regression

Separating Hyperplanes

Piecewise Polynomials & Splines

Filtering & Feature Extraction

Smoothing Splines

Auto-Selection of Smoothing Parameters

Non-parametric Logistic Regression

Multi-dimensional Splines

Regularization & Reproducing Kernel Hilbert Spaces

Wavelet Smoothing

Kernel Width

Local Regression

Structured Local Regression

Local Likelihood

Kernel Density Estimation

Radial Basis Functions & Kernels

Mixture Models

Computational Factors

Bias, Variance, Model Complexity

Bias-Variance Decomposition

Training Error Rates & Optimism

Effective # of Parameters

Bayesian Approach & BIC

Minimum Description Length

Vapnik-Chervonenkis Dimension

Cross Validation

Bootstrap Methods

Bootstrap & Max Likelihood

Bayesian Methods

Bootstrap:Bayesian Relation

EM Algorithm

MCMC for Posterior Sampling

Bagging

Model Averaging & Stacking

Bumping

Tree-Based Methods

PRIM

MARS

Hierarchical Expert Mixtures

Missing Data

Computational Factors

Forward Stagewise Additive Modeling

AdaBoost

Why Exponential Loss?

Loss Functions

"Off the Shelf" Procedures

Example: Spam Data

Boosting Trees

Right-Sized Trees

Regularization

Interpretation

Examples

Projection Persuit Regression

Neural Nets

Fitting

Training Issues

Examples

Discussion

Bayesian NNs

Computational Factors

Support Vector Classifier

Support Vector Machines & Kernels

Generalizing Linear Discriminant Analysis

Flexible Discriminant Analysis

Penalized Discriminant Analysis

Mixture Discriminant Analysis

Prototypes (K-Means, LVQ, Gaussian Mixtures)

k-Nearest-Neighbor Classifiers

Adaptive Nearest-Neighbor Methods

Computational Factors

Association Rules

Cluster Analysis

Self-Organizing Maps

Principal Components

Non-Negative Matrix Factorization

Independent Component Analysis

Multidimensional Scaling (MDS)

Non-Linear Dimension Reduction

Google PageRank

Diagonal LDA

Linear Classifiers - Quadratic Regularization

Linear Classifiers - L1 Regularization

Classification when Features aren't Available

High-Dimensional Regression

Feature Assessment