ESL is one of the most widely accepted introductory texts on Machine Learning. Each chapter link points to a PDF of the relevant section.

Variable Types
Least Squares & Nearest Neighbors
Decision Theory
Statistical Models
Regression Models
Estimator Classes
Model Selection & Bias-Variance
Least Squares
Subsets
Shrinkage
Derived Input Directions
Comparisons
Multiple Outcomes
Lasso & Related
Computational Factors
Intro
Indicator Matrix
Discriminant Analysis
Logistic Regression
Separating Hyperplanes
Intro
Piecewise Polynomials & Splines
Filtering & Feature Extraction
Smoothing Splines
Auto-Selection of Smoothing Parameters
Non-parametric Logistic Regression
Multi-dimensional Splines
Regularization & Reproducing Kernel Hilbert Spaces
Wavelet Smoothing
1D Smoothers
Kernel Width
Local Regression
Structured Local Regression
Local Likelihood
Kernel Density Estimation
Radial Basis Functions & Kernels
Mixture Models
Computational Factors
Intro
Bias, Variance, Model Complexity
Bias-Variance Decomposition
Training Error Rates & Optimism
Effective # of Parameters
Bayesian Approach & BIC
Minimum Description Length
Vapnik-Chervonenkis Dimension
Cross Validation
Bootstrap Methods
Intro
Bootstrap & Max Likelihood
Bayesian Methods
Bootstrap:Bayesian Relation
EM Algorithm
MCMC for Posterior Sampling
Bagging
Model Averaging & Stacking
Bumping
Generalized Additive Models
Tree-Based Methods
PRIM
MARS
Hierarchical Expert Mixtures
Missing Data
Computational Factors
Boosting Methods
Forward Stagewise Additive Modeling
AdaBoost
Why Exponential Loss?
Loss Functions
"Off the Shelf" Procedures
Example: Spam Data
Boosting Trees
Right-Sized Trees
Regularization
Interpretation
Examples
Intro
Projection Persuit Regression
Neural Nets
Fitting
Training Issues
Examples
Discussion
Bayesian NNs
Computational Factors
Intro
Support Vector Classifier
Support Vector Machines & Kernels
Generalizing Linear Discriminant Analysis
Flexible Discriminant Analysis
Penalized Discriminant Analysis
Mixture Discriminant Analysis
Intro
Prototypes (K-Means, LVQ, Gaussian Mixtures)
k-Nearest-Neighbor Classifiers
Adaptive Nearest-Neighbor Methods
Computational Factors
Intro
Association Rules
Cluster Analysis
Self-Organizing Maps
Principal Components
Non-Negative Matrix Factorization
Independent Component Analysis
Multidimensional Scaling (MDS)
Non-Linear Dimension Reduction
Google PageRank
Intro
Definition
Out-of-Bag, Variable Importance, Proximity, Overfitting
Analysis
Intro
Boosting & Regularization
Learning & Ensembles
Intro
Markov Graphs
Continuous-Variable Graphs
Discrete-Variable Graphs
When P >> N
Diagonal LDA
Linear Classifiers - Quadratic Regularization
Linear Classifiers - L1 Regularization
Classification when Features aren't Available
High-Dimensional Regression
Feature Assessment