Preprocessing

Standard Scaling

from sklearn import preprocessing import numpy as np X_train = np.array([[ 1., -1., 2.], [ 2., 0., 0.], [ 0., 1., -1.]])

scaler = preprocessing.StandardScaler().fit(Xtrain) print(scaler,"\n",scaler.mean,"\n",scaler.scale_)

X_scaled = scaler.transform(X_train) print(X_scaled)

Min-Max Scaling and Max Abs Scaling

X_train = np.array([[ 1., -1., 2.], [ 2., 0., 0.], [ 0., 1., -1.]])

max_abs_scaler = preprocessing.MaxAbsScaler() X_train_maxabs = max_abs_scaler.fit_transform(X_train) print(X_train_maxabs)

X_test = np.array([[ -3., -1., 4.]]) X_test_maxabs = max_abs_scaler.transform(X_test) print(X_test_maxabs)

print(max_absscaler.scale)

Scaling sparse data

Scaling with outliers with Robust Scaler

Scaling kernel matrices with KernelCenterer

Quantile Transforms

Quantile Mapping to a Uniform [0..1] Distribution

Power Mapping to a Gaussian Distribution

Example: Map data to Normal Distributions (Box-Cox, Yeo-Johnson)

Normalization

Categories to Integers

Categories to one-of-K ("One Hot")

Quantization, aka Binning

KBinsDiscretizer partitions features into $k$ bins.

Example: Binning Continuous Features with KBinsDiscretizer

Example: Feature discretization

KBinsDiscretizer strategy comparisons

Feature Binarization

Generating polynomial features

Custom Transformers