python/feature engineering

one-hot, OHE, frequencycounts, ordered ordinals, target means, weight of evidence, rare groupings, binary

setup, tips, caching, regression target transforms

combining, reference values, polynomial expansions, feature trees, periodic data, splines, cyclicals, polynomials

intro, removal, mean vs median, categorical data, missing data -> arbitrary values, finding extreme values, marking imputed values, multivariate (chained) imputation, estimated imputation - K nearest neighbors

univariate, multivariate, nearest-neighbor, marking imputed values

logarithmic, reciprocal, square root, power, box-cox, yeo-johnson

iris, digits, cal housing, labeled faces, 20 newsgroups, (more)

dates, times, elapsed times, time zones, automation

equal width, equal frequency, user-defined intervals, k-means, binarization, decision trees, example

one-hot encoding, word counts, tf-idf, linear-to-polynomial, missing data, pipelines

bag of words, sparsity, vectorizers, stop words, tf-idf, decoding, applications, limits, the hashing trick, out-of-core ops

CSV, HDF5, h5py, pytables, hdfstore, JSON, serialization, pickle issues

boxplots, mean & stdev, IQR, removal, capping, capping with quantiles

mean removal, variance scaling, sparse scaling, outlier scaling, distribution maps, normalization, category coding, binning, binarization, polynomial features.

standardization, min-max scaling, robust scaling, mean normalization, max abs scaling, unit-length scaling

entity set setup, cumulative primitives, combining numeric features, datetimes, aggregations, retail dataset

auto extraction, relevant features, , specific features, post-selection extractions, occupancy dataset