**Distance**are functions`d(a,b)`

such that`d(a,b)<d(a,c)`

if a & b are "more similar" than a & c. Two identical objects have a zero distance. One of the most common examples is Euclidean distance.**Kernels**are measures of similarity.`s(a,b)`

>`s(a,c)`

if a & b are more similar than a & c. Kernels must be positive semi-definite.There are multiple ways to convert between distance metrics & similarity measures such as kernels. Let $D$ = distance & $S$ = kernel.

`S=np.exp(-D*gamma)`

; one way to choose gamma is 1/num_features.`S=1/D(/np.max(D))`

`pairwise_distances`

measures the row vectors of X & Y. If Y is omitted the pairwise distances of the row vectors of X are calculated.`pairwise_kernels`

calculates the kernel between X and Y using different kernel functions.

In [3]:

```
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn.metrics.pairwise import pairwise_kernels
X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])
print(pairwise_distances(X,
Y,
metric='manhattan'),"\n")
print(pairwise_distances(X,
metric='manhattan'),"\n")
print(pairwise_kernels(X,
Y,
metric='linear'),"\n")
```

[[ 4. 2.] [ 7. 5.] [12. 10.]] [[0. 3. 8.] [3. 0. 5.] [8. 5. 0.]] [[ 2. 7.] [ 3. 11.] [ 5. 18.]]

`Cosine Similarity`

finds the*L2-normalized dot product*of vectors. Euclidean L2 normalization projects vectors onto a unit sphere - their dot product is the*cosine of the angle between the points defined by the vectors*.$k(x, y) = \frac{x y^\top}{\|x\| \|y\|}$

Popular for computing document similarity with tf-idf vectors.

A linear kernel is a special case of a

*polynomial kernel*with`degree=1`

and`coef0=0`

.$k(x, y) = x^\top y$

Computes a d-degree polynomial kernel that represents the similarity between two vectors. It includes the similiarity under the same dimension & across dimensions (this accounts for feature interaction.)

$k(x, y) = (\gamma x^\top y +c_0)^d$

Also known as the hyperbolic tangent & is often used as an activation function in neural nets.

$k(x, y) = \tanh( \gamma x^\top y + c_0)$

- $k(x, y) = \exp( -\gamma \| x-y \|^2)$

$k(x, y) = \exp( -\gamma \| x-y \|_1)$

Very popular for training nonlinear SVMs for computer vision.

$k(x, y) = \exp \left (-\gamma \sum_i \frac{(x[i] - y[i]) ^ 2}{x[i] + y[i]} \right )$