Discriminant Analysis¶

• Classic multipurpose classifiers
• Linear and Quadratic decision surface solvers available.
• No parameter tuning required.

Dimensional Reduction using Linear DA¶

• Works by projecting input data to a linear subspace containing axes that maximize the distance between classes.
• Uses the transform method.
• Desired #dimensions can be set using n_components (has no impact on the fit and predict methods.

Math foundations¶

• LDA & QDA model a conditional distribution ($P(X|y=k)$) for each class $k$. Predictions are based on a Bayes rule; we select $k$ which returns a max posterior probability.
• $P(x|y)$ is defined as a multivariate Gaussian distribution: $P(x | y=k) = \frac{1}{(2\pi)^{d/2} |\Sigma_k|^{1/2}}\exp\left(-\frac{1}{2} (x-\mu_k)^t \Sigma_k^{-1} (x-\mu_k)\right)$, where $d$ is the number of features.
• If the QDA model assumes that the covariance matrices are diagonal, then the inputs are assumed to be conditionally independent in each class - so the resulting classifier model is equivalent to a Gaussian Naive Bayes classifier.
• LDA is a special case of QDA where the Gaussians for each class share the same covariance matrix for all $k$. This reduces the log posterior function to: $\log P(y=k | x) = -\frac{1}{2} (x-\mu_k)^t \Sigma^{-1} (x-\mu_k) + \log P(y = k) + Cst.$.
• $(x-\mu_k)^t \Sigma^{-1} (x-\mu_k)$ is the Mahalanobis Distance between a sample $x$ and the mean $\mu_k$. LDA assigns $x$ to the class that is closest in (Mahaloanobis) distance.

Shrinkage¶

• A type of regularization - it improves covariance matrix estimation (generalization performance) when the #samples << #features.
• Can be set using the shrinkage param to 'auto', which uses the Ledoit/Wolf lemma to find an optimal setting. (Note: currently, shrinkage only works when the solver param is set to 'lsqr' or 'eigen'.
• Can also be manually set between 0-1.
• 0 = no shrinkage = use the empirical covariance matrix
• 1 = full shrinkage = use the diagonal variance matrix
• If data is normally distributed, the Oracle Shrinkage Approximating estimator returns a smaller MSE than Ledoit/Wolf.
• Covariance estimators are set using covariance_estimator. Estimators should have a fit method and a covariance_ attribute.

Estimators¶

• LDA & QDA modeling requires computing a log-posterior. The dependencies are the class priors ($P(y=k)$), class means ($u_k$) and the covariance matrices.
• svd is the default solver for LDA, and the only solver for QDA. Since it does not rely on computing a covariance matrix, svd is preferred for problems with large #features.

• svd cannot be used with shrinkage.
• Two SVDs are computed during LDA: one for centered input matrix X, another for the class-wise mean vectors.
• lsqr only works for classification problems. It computes the covariance matrix, supports shrinkage, and support custom covariance estimators.

• The eigen solver is uses optimization of a between-class scatter / within-class scatter ratio.

• eigen can be used for both classifications & regressions.
• eigen supports shrinkage.
• eigen needs to compute the covariance matrix, therefore may not be ideal for problems with large #features.