What is it?

Limitations

Uses

Limitations

Uses

Collaborative Filtering

Collecting Preferences

Finding Similar Users

Recommending Items

Matching Products

Building a del.icio.us Link Recommender

Item-Based Filtering

Using the MovieLens Dataset

User-Based or Item-Based Filtering?

Supervised versus Unsupervised Learning

Word Vectors

Hierarchical Clustering

Drawing the Dendrogram

Column Clustering

K-Means Clustering

Clusters of Preferences

Viewing Data in Two Dimensions

Other Things to Cluster

What’s in a Search Engine?

A Simple Crawler

Building the Index

Querying

Content-Based Ranking

Using Inbound Links

Learning from Clicks

Group Travel

Representing Solutions

The Cost Function

Random Searching

Hill Climbing

Simulated Annealing

Genetic Algorithms

Real Flight Searches

Optimizing for Preferences

Network Visualization

Other Possibilities

Filtering Spam

Documents and Words

Training the Classifier

Calculating Probabilities

A Naïve Classifier

The Fisher Method

Persisting the Trained Classifiers

Filtering Blog Feeds

Improving Feature Detection

Using Akismet

Alternative Methods

Predicting Signups

Introducing Decision Trees

Training

Choosing the Best Split (Gini Impurity, Entropy)

Recursive Tree Building

Displaying the Tree

Classifying New Observations

Pruning the Tree

Missing Data

Numerical Outcomes

Modeling Home Prices (Zillow API)

Modeling “Hotness” (Hot or Not)

When to Use Decision Trees

Building a Sample Dataset

k-Nearest Neighbors

Weighted Neighbors

Cross-Validation

Heterogeneous Variables

Optimizing the Scale

Uneven Distributions

Using Real Data—the eBay API

When to Use k-Nearest Neighbors

Matchmaker Dataset

Difficulties with the Data

Basic Linear Classification

Categorical Features

Scaling the Data

Understanding Kernel Methods

Support-Vector Machines

Using LIBSVM

Matching on Facebook

A Corpus of News

Previous Approaches

Non-Negative Matrix Factorization

Displaying the Results

Using Stock Market Data

What Is Genetic Programming?

Programs As Trees

Creating the Initial Population

Testing a Solution

Mutating Programs

Crossover

Building the Environment

A Simple Game

Further Possibilities

Bayesian Classifier

Decision Tree Classifier

Neural Networks

Support-Vector Machines

k-Nearest Neighbors

Clustering

Multidimensional Scaling

Non-Negative Matrix Factorization

Optimization

Universal Feed Parser

Python Imaging Library (PIL)

Beautiful Soup

pysqlite

NumPy

matplotlib

pydelicious

Euclidean Distance

Pearson Correlation Coefficient

Weighted Mean

Tanimoto Coefficient

Conditional Probability

Gini Impurity

Entropy

Variance

Gaussian Function

Dot-Products

