Obviously Awesome

Data Mining of Massive Datasets (3rd ed) book links

This post is in progress. It will be fleshed out as time permits.

  • Data Mining of Massive Datasets, 3rd ed

    • Contents

      • Definitions; statistical limits; useful concepts;
    • MapReduce

      • Distributed file systems; MapReduce; algorithms; extensions; communications-cost model; complexity theory
    • Finding Similar Items

      • Set similarity; document shingling; summaries & similarity preservation; locality-sensitive hashing; distance measures; locality-sensitive function theory; LSH & other metrics; LSH applications; methods for high degrees of similarity
    • Mining Streams

      • Data model; sampling; filtering; counting distinct elements; estimating moments; counting “ones” in a window; decaying windows
    • Link Analysis

      • PageRank; PageRank computation; topic-sensitive PageRank; link spam; hubs & authorities
    • Frequent Itemsets

      • Market-basket model; A-Priori algorithm; large datasets & main memory; limited-pass algorithms; counting frequent items in a stream
    • Clustering

      • Intro; hierarchical; K-means; CURE algorithm; non-Euclidean clustering; streams & parallelism
    • Web Advertising

      • Issues; online algorithms; matching; Adwords problem; Adwords implementation
    • Recommenders

      • Model; content-based; collaborative; dimensionality reduction
    • Social Network Graph Mining

      • Social nets as graphs; clustering; community discovery; graph partitions; overlapping communities; Simrank; counting triangles; neighborhood properties
    • Dimensionality Reduction

      • Eigenvalues & eigenvectors of symmetric matrices; principal component analysis; singular value decomposition; CUR decomposition
    • Scaling

      • Model; perceptrons; support-vector machines; nearest neighbors; decision trees; comparison of methods
    • Neural Nets

      • Intro; dense feedforward nets; backprop; gradient descent; convolutional nets; recurrent nets; regularization