### Nearest Neighbors¶

• unsupervised NN = basis for manifold learning, spectral clustering & kernel density estimation.
• supervised NN = used for classification (discrete labels) and regression (continuous labels)
• basics: find a predefined #training samples closest to a new point & predict the label based on them. The #samples can be predefined (kNN) or based on a local density (radius-based-NN).
• Any distance metric can be used - Euclidean is typical.
• NN is described as "non-generalizing" - it uses all of its training data.
• Can use dense (NumPy) or sparse (SciPy.sparse) input matrices.

### Example: Return a Graph (sparse) w/ connections between points¶

• Data is structured so that nearby points (in index order) are also nearby in parameter space (an approximately block-diagonal matrix of KNNs).

### Example: KDTree implementation¶

• The Ball Tree variant looks very similar.

### KNN and Radius-based NN Classification¶

• Computed from a simple majority vote of NNs of each point.
• KNN is most commonly used; optimal $k$ is very data-dependent.
• Radius-based NNs more useful when data can't be uniformly sampled. Depends on radius $r$ - points in sparser neighborhoods use fewer NNs for classification. Radius-based NNs suffer from dimensionality problems in high-dimension datasets.

• Basic classification uses weights=uniform to compute a simple majority vote of NNs. weights=distance assigns weights proportional to the inverse of distance from the query point. (You can also define a custom distance function.)

### KNN and Radius-based NN Regression¶

• Same principles and options as in NN-based classification.

### Example: upper/lower half face matching¶

• Goal: predict lower half of each face image, given the upper half.
• 1st column = true faces
• 2nd-5th columns = attempted completions using extra trees, KNN, linear regression, ridge regression.

### Algorithm Comparison¶

• Brute Force (algorithm=brute): computational complexity scales with $N$ samples in $D$ dimensions as $O[D N^2]$. Quickly becomes unfeasible for anything beyond small datasets.

• KD Tree ("K dimensional tree") ('algorithm=kdtree'):

• An example of reducing distance calculations by encoding aggregate distance information. (If A is distant from B, and B is close to C, then A & C are assumed to be distant from each other.)
• Computational complexity scales as $O[D N \log(N)]$.
• Binary tree structure - recursively divides a parameter space into nested orthotropic regions. Very fast construction.
• Once built, NN calculation for a single point is $O[\log(N)]$.
• Very fast for $D$<20 datasets, but bogs down for high-D datasets.
• Ball Tree (algorithm=ball_tree):
• Designed to address KD Tree weaknesses in high-D problems.
• Partitions data into nesting hyperspheres (not Cartesian).
• Slower construction, but efficient NN calculations
• Nodes are defined by centroid $C$ and radius $r$.
• Number of candidate points for a NN search uses a triangle inequality relation for reduction: $|x+y| \leq |x| + |y|$.

### Nearest Centroid Classifier¶

• Represents each class with the centroid of its members.
• No parameters to tune, so it can be a decent baseline classifier.
• Suffers on non-convex classes, data with signficantly different variances. (equal variance in all dimensions is assumed).
• LDA & QDA do not make this assumption.
• Nearest Centroid can use shrink_threshold parameter to divide the value of each feature by its within-class variance, then reduce it by shrink_threshold.
• This is useful for overly noisy datasets.

### KNN and Radius-based NN Transformers¶

• Many Scikit-Learn methods rely on NNs. Most can compute NNs on their own and use precomputed sparse graphs from here and there.
• Example: SpectralClustering can use a sparse binary adjacency graph via mode="connectivity".
• Example: DBSCAN can use a sparse distance graph via mode="distance".
• Use NN transformers in a pipeline to include these functions.
• Why use a precomputed graph?
• They can be reused while varying the parameters.
• They can provide fine-tuned control, for example parallel processing via n_jobs (not available on all estimators).
• They can be specialized for custom jobs (approximate NNs, special data types, ...)
• When the #neighbors is specified, the definition is ambiguous - it can, or may not, include each training point as its own neighbor.
• KNeighborsTransformer includes each training point as its own neighbor by default. One extra neighbor will be computed by using mode="distance", for compatibility with other estimators. The safe choice is to always include one extra neighbor.

### Example: Approximate NNs pipelined to TSNE¶

- **** TODO: SOLVE Attribute error bug - ticket submitted to scikit-learn github account ****

• How to chain K neighbors Transformer and TSNE

### Example: Nearest Neighbors Caching¶

• How to precompute a KNN structure before using it in a Classifier.
• Use a pipeline to cache a NN graph between multiple KNeighborClassifier runs
• First call is slower for computation
• Subsequent calls much faster