68_pairwise

Pairwise Ops ¶

Distance are functions d(a,b) such that d(a,b)<d(a,c) if a & b are "more similar" than a & c. Two identical objects have a zero distance. One of the most common examples is Euclidean distance.
Kernels are measures of similarity. s(a,b)>s(a,c) if a & b are more similar than a & c. Kernels must be positive semi-definite.
There are multiple ways to convert between distance metrics & similarity measures such as kernels. Let $D$ = distance & $S$ = kernel.
- S=np.exp(-D*gamma); one way to choose gamma is 1/num_features.
- S=1/D(/np.max(D))

pairwise_distances measures the row vectors of X & Y. If Y is omitted the pairwise distances of the row vectors of X are calculated.
pairwise_kernels calculates the kernel between X and Y using different kernel functions.

In [3]:

import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn.metrics.pairwise import pairwise_kernels

X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])

print(pairwise_distances(X, 
                         Y, 
                         metric='manhattan'),"\n")
print(pairwise_distances(X, 
                         metric='manhattan'),"\n")
print(pairwise_kernels(X, 
                       Y, 
                       metric='linear'),"\n")

[[ 4.  2.]
 [ 7.  5.]
 [12. 10.]] 

[[0. 3. 8.]
 [3. 0. 5.]
 [8. 5. 0.]] 

[[ 2.  7.]
 [ 3. 11.]
 [ 5. 18.]]

Cosine Similarity ¶

Cosine Similarity finds the L2-normalized dot product of vectors. Euclidean L2 normalization projects vectors onto a unit sphere - their dot product is the cosine of the angle between the points defined by the vectors.
$k(x, y) = \frac{x y^\top}{\|x\| \|y\|}$
Popular for computing document similarity with tf-idf vectors.

Linear Kernel ¶

A linear kernel is a special case of a polynomial kernel with degree=1 and coef0=0.
$k(x, y) = x^\top y$

Polynomial Kernel ¶

Computes a d-degree polynomial kernel that represents the similarity between two vectors. It includes the similiarity under the same dimension & across dimensions (this accounts for feature interaction.)
$k(x, y) = (\gamma x^\top y +c_0)^d$

Sigmoid Kernel ¶

Also known as the hyperbolic tangent & is often used as an activation function in neural nets.
$k(x, y) = \tanh( \gamma x^\top y + c_0)$

RBF Kernel ¶

$k(x, y) = \exp( -\gamma \| x-y \|^2)$

Laplacian Kernel ¶

$k(x, y) = \exp( -\gamma \| x-y \|_1)$

Chi-Squared Kernel ¶

Very popular for training nonlinear SVMs for computer vision.
$k(x, y) = \exp \left (-\gamma \sum_i \frac{(x[i] - y[i]) ^ 2}{x[i] + y[i]} \right )$