one-hot, OHE, frequencycounts, ordered ordinals, target means, weight of evidence, rare groupings, binary
setup, tips, caching, regression target transforms
combining, reference values, polynomial expansions, feature trees, periodic data, splines, cyclicals, polynomials
intro, removal, mean vs median, categorical data, missing data -> arbitrary values, finding extreme values, marking imputed values, multivariate (chained) imputation, estimated imputation - K nearest neighbors
univariate, multivariate, nearest-neighbor, marking imputed values
logarithmic, reciprocal, square root, power, box-cox, yeo-johnson
iris, digits, cal housing, labeled faces, 20 newsgroups, (more)
dates, times, elapsed times, time zones, automation
equal width, equal frequency, user-defined intervals, k-means, binarization, decision trees, example
one-hot encoding, word counts, tf-idf, linear-to-polynomial, missing data, pipelines
bag of words, sparsity, vectorizers, stop words, tf-idf, decoding, applications, limits, the hashing trick, out-of-core ops
CSV, HDF5, h5py, pytables, hdfstore, JSON, serialization, pickle issues
boxplots, mean & stdev, IQR, removal, capping, capping with quantiles
mean removal, variance scaling, sparse scaling, outlier scaling, distribution maps, normalization, category coding, binning, binarization, polynomial features.
standardization, min-max scaling, robust scaling, mean normalization, max abs scaling, unit-length scaling
entity set setup, cumulative primitives, combining numeric features, datetimes, aggregations, retail dataset
auto extraction, relevant features, , specific features, post-selection extractions, occupancy dataset