Deep Learning - Goodfellow (book notes)

Scalars, Vectors, Matrics, Tensors
Vector & Matrix Multiplication
Identity & Inverse Matrices
Linear Dependence
Span
Norms
Other Special Vectors & Matrices
Eigendecomposition
Singular Value Decomposition
Moore-Penrose Pseudoinverse
Trace Operator
Determinants
Principal Component Analysis
Why?
Random Variables
Distributions
Marginal Probability
Conditional Probability
Chain Rule
(Conditional) Independence
Expectation, Variance, Covariance
Common Distributions
Useful Properties
Bayes' Rule
Continuous Variables - Details
Information Theory
Structured Models
Overflow, Underflow
(Poor) Conditioning
Gradients
Constrained Optimization
Ex: Linear Least Squares
Learning Algorithms
Capacity, Overfit, Underfit
Hyperparameters
Validation Sets
Estimators, Bias, Variance
Max Likelihood Estimation (MLE)
Bayes Statistics
Supervised Learning
Unsupervised Learning
Stochastic Gradient Descent (SGD)
Building an Algorithm
Challenges
Ex: XOR
Gradient-based Learning
Hidden Units
Architecture Design
Back-Propagation & Related Algos
Historical Notes
Parameter Norm Penalties
Penalties as Constrained Optimization
Under-Constrained Problems
Dataset Augmentation
Noise Robustness
Semi-Supervised Learning
Multi-task Learning
Early Stopping
Parameter Tying/Sharing
Sparse Representations
Ensemble Methods (Bagging, etc)
Dropout
Adversarial Training
Tangent Distance, Tangentassifier
Learning vs Pure Optimization
Challenges
Basic Algos
Parameter Setup
Adaptive Learning Rates
Approximate 2nd-Order Methods
Meta-Algorithms
Convolution
Motivation
Pooling
Infinitely Strong Prior
Variants
Structured Outputs
Data types
Efficient Algorithms
Random / Unsupervised Features
Historicals
Unfolding Computational Graphs
Recurrent NNs (RNNs)
Bidirectional RNNs
coder-Decoder Architectures
Deep RNNs
Recursive NNs
Long-Term Dependency Challenges
Echo State Nets
Multiple Time Scale Strategies
Gated RNNs
Long-Term Dependency Optimization
Explicit Memory
Metrics
Baseline Models
Gather More Data?
Hyperparameters
Debugging
Example: Digit Recognition
Large-scale Deep Learning
Vision
Speech Recognition
Natural Language Processing
More
Undercomplete AEs
Regularized AEs
Representational Power, Layer Size, Depth
Stochastic Encoders, Decoders
Learning Manifolds with AEs
Contractive AEs
Predictive Sparse Decomposition (PSD)
Applications
Transfer Learning, Domain Adaptation
Semi-Supervised Causal Factor Analysis
Distributed Representation
Exponential Gains from Depth
Clues & Causes
Challenges
Using Graphs
Sampling from Graphs
Advantages
Learning about Dependencies
(Approximate) Inference
Deep Learning & SPMs
Sampling
Importance Sampling
Markov Chain Monte Carlo (MCMC)
Gibbs Sampling
Challenges
Log-Likelihood Gradient
Stochastic Max Likelihood & Contrastive Divergence
Pseudolikelihood
Score/Ratio Matching
Denoising Score Matching
Noise-Contrastive Estimation
Estimating the Partition Function
Inference as Optimization
Expectation Maximization (EM)
MAP Inference, Sparse Coding
Variational Inference
Learned Approximate Inference
Boltzmann Machines (BMs)
Restricted BMs
Deep Belief Nets (DBNs)
Deep BMs
BMs & Real World Data
Convolutional BMs
BMs - Structured/Sequential outputs
Other BMs
Back-Propagation thru Random Ops
Directed Generative Nets
Drawing Samples from AEs
Generative Stochastic Nets
Other Generation Schemes