### Hypermarameter Tuning¶

• Hyper-parameters are parameters that are not directly learnt within estimators. In scikit-learn they are passed as arguments to the constructor of the estimator class. Examples include C, kernel and gamma for Support Vector Classifier, alpha for Lasso, etc.

• Use estimator.get_params() to get the names & current values of all parameters in an estimator.

• Scikit-Learn has two approaches to parameter search: GridSearchCV exhaustively considers all combinations; RandomizedSearchCV samples candidates from a parameter space with a specified distribution.

• Both have successive halving counterparts HalvingGridSearchCV and HalvingRandomSearchCV, which can be much faster.

• A small subset of those parameters can have a large impact while others can be left to their default values.

• Exhaustively generates candidates from a grid of values specified by param_grid.

### Randomized Param Optimization¶

• While grid search is the most popular method for parameter optimization, RandomizedSearchCV searches each setting by sampling a distribution over its possible values. This has two benefits:

• A budget can be chosen independent of the #parameters and possible values.

• Adding parameters that do not influence the performance does not decrease efficiency.

• Parameters are chosen with a dictionary plus a computation budget, (a #sampled candidates or sampling iterations) with n_iter.

• Supply either a distribution or list of choices (which will be sampled uniformly): {'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1), 'kernel': ['rbf'], 'class_weight':['balanced', None]}

• Specify a continuous distribution to use randomization with continuous variables such as C.

• A continuous log-uniform random variable is available through loguniform. For example, loguniform(1, 100) can be used instead of [1, 10, 100] or np.logspace(0, 2, num=1000). This is an alias to SciPy’s stats.reciprocal.

• Mirroring the example above in grid search, we can specify a continuous random variable that is log-uniformly distributed between 1e0 and 1e3:

### Example: PCA Complexity vs Cross-Validated Accuracy¶

• Successive halving (SH) is like a tournament among candidate parameter combinations. It is an iterative selection process where all parameter combinations are evaluated with a small amount of resources (typically training samples) at the first iteration. Only some of these candidates are selected for the next iteration, which will be allocated more resources. The resources can also be an arbitrary numeric parameter such as n_estimators in a random forest.

• Below: only a subset of candidates ‘survive’ until the last iteration. They have consistently ranked among the top-scoring candidates across all iterations. Each iteration is allocated an increasing amount of resources per candidate.

### Example: Successive Halving Iterations¶

• 1st iteration: a small amount of resources (#samples) is used. All candidates are evaluated.

• 2nd iteration: the best half of the candidates is evaluated. The number of allocated resources is doubled: candidates are evaluated on twice as many samples.

• last iteration: only 2 candidates are left. The best candidate is the candidate that has the best score at the last iteration.

### Tips¶

• Parameter search uses the estimator's score function by default. They are accuracy and r2 score for classification and regression respectively.

• Alternatives can specified using the scoring parameter of most search tools.

• You can use multiple metrics for Grid & Randomized searches. They are specified as a list of strings, or a name:function dict.

• When specifying multiple metrics, set refit to the metric for which best_params_ will be used to build the best_estimator_ for the entire dataset. Otherwise set refit=False - the default refit=None setting will cause errors when using multiple metrics.

• Parallelism: param search tools evaluate each data fold independently. Computations can run in parallel vis n_jobs=-1.

• Robustness: some settings can cause fit failures. This will cause the entire search to fail by default. Use error_score=0 or error_score=np.NaN to instead issue a warning & set the score for that fold to zero or NaN.

### Composite Estimators & Parameter Spaces¶

• Grid & Randomized parameter search can use composite or nested estimators (Pipelines, Column Transformers, Voting Classifiers, Calibrated Classifiers). A dedicated syntax is involved.
• If the meta-estimator is build as a collection (ie, Pipeline), estimator refers to the estimator name. There can be multiple nesting levels.

### Info Criteria (AIC, BIC) based Regularization¶

• LassoLARS IC uses either the AIC or BIC info criterions to build optimal regularization estimates with a single regularization path (instead of building several via cross-validation.)
• TODO