### Removing Low-Variance Features¶

• The simplest baseline FS method. It removes all features whose variance fails to meet a threshold value. Zero-variance features are removed by default.
• Example:
• Boolean-featured dataset (Bernoulli random variables: variance = $\mathrm{Var}[X] = p(1 - p)$
• Goal: remove all features with either 0 or 1 in >80% of the samples
• Method should remove the 1st column because it has 5/6>0.8 chance of containing a zero.

### Example: Univariate Selection¶

• Iris dataset with noise added to non-informative features
• For each feature, plot p-values from feature selection & corresponding weights from an SVM.
• Should show FS selecting the informative features, and matching the larger SVM weights.

### Example: Using F-test vs Mutual Information statistics¶

• Plot dependency of y against 3 features (x_1, x_2, x_3) and normalized univariate F-test statistics.
• F-test only captures linear dependency - so it should rate x_1 as the most discriminative.
• Mutual info can capture any dependency - so it should rate x_2 as the most discriminative.
• Both methods should mark x_3 as irrelevant.

### Selecting from a Model¶

• Features that fail a numerical threshold value are removed.
• You can also use statistical arguments including "mean", "median", and "0.1*mean".
• You can also use max_features to set a limit on the number of selected features.

### Selecting from a Linear Model - L1 norm based selection¶

• Linear models penalized with L1 norm have sparse solutions - many zero coefficients.
• Use SelectFromModel to reduce model dimensionality before next steps.
• Lasso (regression), Logistic Regression (classification), & LinearSVC (classification) are sparse estimators than can benefit from this approach.
• Lasso sparsity is controlled via alpha (higher alpha = fewer features.)
• Logistic regression & SVM sparsity is controlled by C (smaller C = fewer features).

### Sequential Selection¶

• Available in two variations: Forward SS (FSS) and Backward SS (BSS).
• Forward SS iteratively finds a best new feature to add to a set of selected features. The procedure stop when the desired #features is found. (controlled by n_features_to_select)
• Backward SS starts with all features and iteratively removes stragglers.
• direction controls the algorithm flow.

### Example: Selecting from a Model vs Sequential Selection¶

• Use cross-validated Lasso
• We only want 2 features, so set threshold just above the coefficient of the 3rd most important feature.