### Clustering Metrics¶

• Not as simple as counting #errors or precision/recall in supervised classification tasks.

### Rand Index, Adjusted Rand Index¶

• Measures the similarity of ground truth (labels_true) data vs a cluster algorithm's assigned labels (labels_pred).

• Definition:

• Unadjusted RI: $\text{RI} = \frac{a + b}{C_2^{n_{samples}}}$
• $C_2^{n_{samples}}$ is the number of possible pairs.
• Adjusted RI: $\text{ARI} = \frac{\text{RI} - E[\text{RI}]}{\max(\text{RI}) - E[\text{RI}]}$

• interpretability: unadjusted RI is proportional to the *number of sample pairs whose labels are the same in labels_pred & labels_true, or are different in both.
• random/uniform label assignments: adjusted RI will be near-zero for any value of n_clusters and n_samples.
• bounded ranges: lower values = different labelings. 1.0 = perfect score. Score ranges are [0,1] (unadjusted) and [-1,+1] (adjusted).
• no cluster structure assumptions: can be used to compare algorithms.
• Drawbacks:

• Requires knowledge of ground truth classes which is rarely available.
• Unadjusted RI values are often close to 1.0 even if the clusters are significantly different.

### Mutual Information Score (Std, Adjusted, Normalized)¶

• Measures the agreement between two datasets, ignoring permutations.
• Given the ground truth data labels_true and algorithm assignments labels_pred.
• Random/uniform labels have AMI scores close to zero.
• Drawbacks:
• MI-based metrics require ground truth knowledge which is rarely available.

### Homogeneity, Completeness, V-Measure¶

• Based on concept of conditional entropy analysis.
• Desireable cluster attributes:
• homogeneity: each cluster contains members of just one class.
• completeness: all members of a class are assigned to a single cluster.
• v-measure: harmonic mean of homogeneity & completeness.
• drawbacks:
• Completely random labeling will not return consistent values. In particular, random labeling won't return zero scores when the number of clusters is large.
• This problem can be ignored when #samples>1000 and #clusters<10. Otherwise consider using Adjusted Rand Index.
• Requires knowledge of ground truth class data which is rarely available.
• definition:
• Homogeneity: $h = 1 - \frac{H(C|K)}{H(C)}$
• Completeness: $c = 1 - \frac{H(K|C)}{H(K)}$
• Conditional class entropy, given a cluster assignment: $H(C|K) = - \sum_{c=1}^{|C|} \sum_{k=1}^{|K|} \frac{n_{c,k}}{n} \cdot \log\left(\frac{n_{c,k}}{n_k}\right)$
• Class entropy: $H(C) = - \sum_{c=1}^{|C|} \frac{n_c}{n} \cdot \log\left(\frac{n_c}{n}\right)$
• where $n$ = #samples, $n_c$ = #samples in class c, $n_k$ = #samples in cluster k, $n_ck$ = #samples from class c assigned to cluster k.
• V-measure: $v = 2 \cdot \frac{h \cdot c}{h + c}$

### Fowlkes-Mallows Score¶

• Defined as the geometric mean of pairwise precision & recall: $\text{FMI} = \frac{\text{TP}}{\sqrt{(\text{TP} + \text{FP}) (\text{TP} + \text{FN})}}$
• Scores range from 0 to 1. High values = good similarity between clusters.
• Random/uniform labelings have FMI scores close to zero.
• Upper bounded at 1.0.
• No cluster structure assumptions.
• drawbacks:
• Requires ground truth label data which is rarely available.

### Silhouette Coefficient¶

• Use when ground truth labels are not known.
• Scores are bounded between -1 (incorrect clustering) & +1 (highly dense clustering). Scores~0 indicate overlapping clusters.
• High scores indicate well-defined clusters. Intuitive.
• Definition:
• $s = \frac{b - a}{max(a, b)}$
• $a$ = mean distance between a sample & all other points in the same class.
• $b$ = mean distance between a sample & all other points in the next nearest cluster.

### Example: Using Silhouette Analysis to find optimal Kmeans cluster count¶

• n_clusters values of 3,5,6 are bad picks due to finding clusters with poor silhouette scores (and due to wide flunctuations in plot sizes.)
• n_clusters values of 2,4 appear to be ambivalent.