### Pairwise Ops¶

• Distance are functions d(a,b) such that d(a,b)<d(a,c) if a & b are "more similar" than a & c. Two identical objects have a zero distance. One of the most common examples is Euclidean distance.

• Kernels are measures of similarity. s(a,b)>s(a,c) if a & b are more similar than a & c. Kernels must be positive semi-definite.

• There are multiple ways to convert between distance metrics & similarity measures such as kernels. Let $D$ = distance & $S$ = kernel.

• S=np.exp(-D*gamma); one way to choose gamma is 1/num_features.
• S=1/D(/np.max(D))
• pairwise_distances measures the row vectors of X & Y. If Y is omitted the pairwise distances of the row vectors of X are calculated.

• pairwise_kernels calculates the kernel between X and Y using different kernel functions.

### Cosine Similarity¶

• Cosine Similarity finds the L2-normalized dot product of vectors. Euclidean L2 normalization projects vectors onto a unit sphere - their dot product is the cosine of the angle between the points defined by the vectors.

• $k(x, y) = \frac{x y^\top}{\|x\| \|y\|}$

• Popular for computing document similarity with tf-idf vectors.

### Linear Kernel¶

• A linear kernel is a special case of a polynomial kernel with degree=1 and coef0=0.

• $k(x, y) = x^\top y$

### Polynomial Kernel¶

• Computes a d-degree polynomial kernel that represents the similarity between two vectors. It includes the similiarity under the same dimension & across dimensions (this accounts for feature interaction.)

• $k(x, y) = (\gamma x^\top y +c_0)^d$

### Sigmoid Kernel¶

• Also known as the hyperbolic tangent & is often used as an activation function in neural nets.

• $k(x, y) = \tanh( \gamma x^\top y + c_0)$

### RBF Kernel¶

• $k(x, y) = \exp( -\gamma \| x-y \|^2)$

### Laplacian Kernel¶

$k(x, y) = \exp( -\gamma \| x-y \|_1)$

### Chi-Squared Kernel¶

• Very popular for training nonlinear SVMs for computer vision.

• $k(x, y) = \exp \left (-\gamma \sum_i \frac{(x[i] - y[i]) ^ 2}{x[i] + y[i]} \right )$