one-hot encoding, word counts, tf-idf, linear-to-polynomial, missing data, pipelines
bag of words, sparsity, vectorizers, stop words, tf-idf, decoding, applications, limits, the hashing trick, out-of-core ops
CSV, HDF5, h5py, pytables, hdfstore, JSON, serialization, pickle issues
mean removal, variance scaling, sparse scaling, outlier scaling, distribution maps, normalization, category coding, binning, binarization, polynomial features.