Data Mining: a Programmer’s Guide (Zacharski)
Intro; finding similar items; Manhattan distance; Euclidean distance; Minkowski distance; Pearson correlation coefficient; cosine similarity; k-nearest-neighbors in Python; book crossing dataset
Explicit & implicit ratings; user-based filters; item-based filters; adjusted cosine similarity; slope one algorithm; Python code; MovieLens dataset
Pandora-like systems; selecting appropriate attributes; example; data normalization; modified standard score; Python code; sports example; acquiring attribute data
Training sets & test data; 10-fold cross validation; adding data vs algorithm tweaks; kNN; Python code
Naive Bayes & Probability Density Functions
Lazy & eager learning; probability refresher; conditional probability; Bayes theorem; Python code; Congress Voting dataset; Gaussian distribution; Python code
Naive Bayes & unstructured text
Positive & negative texts; classifier training; stop words; newsgroup classifier; Python code; sentiment analysis
Intro; hierarchical; single/complete/average linkages; dog breed clusters; breakfast cereal clusters; Kmeans; Kmeans++, Enron email dataset