### clustering¶

• Grouping algorithms for unlabeled data.
• Scikit-Learn clustering classes return a labels_ attribute after learning the data.
• Each method accepts a matrix (#samples, #features) as inputs - they can be obtained from feature_extraction classes.
• Affinity Propagation, Spectral Clustering & DBSCAN also accept similarity matrices (#samples, #samples) as inputs. they can be obtained from pairwise functions.

### Kmeans, aka Lloyd's Algorithm¶

• Kmeans clusters data into $n$ (this is specified) groups of equal variance.
• It minimizes inertia (aka within-cluster sum-of-squares), $\sum_{i=0}^{n}\min_{\mu_j \in C}(||x_i - \mu_j||^2)$, to divide $N$ samples, into $K$ clusters, each described by the mean ($u_j$) of the cluster's samples.
• Inertia is a measure of a cluster's "internal coherenence".
• It assumes clusters are convex & isotropic, which isn't always true. It responds poorly to elongated clusters or manifolds with irregular shapes.
• It is not a normalized metric and is distorted by high-dimension data. Consider using dimensionality reduction, such as PCA prior to use.
• Kmeans uses Cython & OpenMP to process 256-sample "chunks" in parallel to reduce memory usage.

### Example: Kmeans assumptions vs performance¶

• Plots 1-3: data does not match some Kmeans assumptions - resulting in erroneous clusters.
• Plot 4: Kmeans returns intuitive results, despite uneven blob sizes.

### Example: Kmeans on digits data - Voronoi diagram¶

• Given enough time, Kmeans always converges, but possibly to a local minimum. This is highly dependent on initial centroid values.
• init="k-means++" initializes the centroids to more distant from each other, leading to better results than by random initialization.
• sample_weight enables sample weighting. A sample with weight=2 is equivalent to adding a duplicate of that sample to the dataset.

### Kmeans (minibatch)¶

• Uses randomly sampled subsets of the dataset during fit to reduce computation time.
• In general minibatch Kmeans performs only slightly worse than the standard implementation.
• The centroids are updated on a per-sample basis with the streaming average of the sample and all previous samples assigned to the centroid.

### Example: Kmeans minibatch vs std performance¶

• Also plot the points that are labeled differently between the two.

### Affinity Propagation¶

• Creates clusters by sending messages between sample pairs.
• Dataset is then described using "exemplars" (samples ID'd as being the most representative of other samples.
• The messages encode the ability of one sample to be an exemplar of another.
• This iterates until convergence (when final exemplars are chosen for the clusters).
• AP chooses the #clusters on the data (not a given number). The key parameters are preference (how many exemplars) and damping factor (helps avoid message oscillation).
• Main drawbacks: time complexity of $O(N^2T)$ & memory complexity of $O(N^2)$ ($N$ = #samples, $T$ = #iterations before convergence). More appropriate for smaller datasets.

### Example: Affinity Propagation - Stock Market Visualization¶

• Use a sparse inverse covariance estimate to find quotes conditionally correlated on others. It will display the strength of the graph edges.
• Use affinity propagation to group similar quotes.
• Use manifold learning to find a 2D embedding for visualization.