Distance are functions d(a,b)
such that d(a,b)<d(a,c)
if a & b are "more similar" than a & c. Two identical objects have a zero distance. One of the most common examples is Euclidean distance.
Kernels are measures of similarity. s(a,b)
>s(a,c)
if a & b are more similar than a & c. Kernels must be positive semi-definite.
There are multiple ways to convert between distance metrics & similarity measures such as kernels. Let $D$ = distance & $S$ = kernel.
S=np.exp(-D*gamma)
; one way to choose gamma is 1/num_features.S=1/D(/np.max(D))
pairwise_distances
measures the row vectors of X & Y. If Y is omitted the pairwise distances of the row vectors of X are calculated.
pairwise_kernels
calculates the kernel between X and Y using different kernel functions.
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn.metrics.pairwise import pairwise_kernels
X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])
print(pairwise_distances(X,
Y,
metric='manhattan'),"\n")
print(pairwise_distances(X,
metric='manhattan'),"\n")
print(pairwise_kernels(X,
Y,
metric='linear'),"\n")
[[ 4. 2.] [ 7. 5.] [12. 10.]] [[0. 3. 8.] [3. 0. 5.] [8. 5. 0.]] [[ 2. 7.] [ 3. 11.] [ 5. 18.]]
Cosine Similarity
finds the L2-normalized dot product of vectors. Euclidean L2 normalization projects vectors onto a unit sphere - their dot product is the cosine of the angle between the points defined by the vectors.
$k(x, y) = \frac{x y^\top}{\|x\| \|y\|}$
Popular for computing document similarity with tf-idf vectors.
A linear kernel is a special case of a polynomial kernel with degree=1
and coef0=0
.
$k(x, y) = x^\top y$
Computes a d-degree polynomial kernel that represents the similarity between two vectors. It includes the similiarity under the same dimension & across dimensions (this accounts for feature interaction.)
$k(x, y) = (\gamma x^\top y +c_0)^d$
Also known as the hyperbolic tangent & is often used as an activation function in neural nets.
$k(x, y) = \tanh( \gamma x^\top y + c_0)$
$k(x, y) = \exp( -\gamma \| x-y \|_1)$
Very popular for training nonlinear SVMs for computer vision.
$k(x, y) = \exp \left (-\gamma \sum_i \frac{(x[i] - y[i]) ^ 2}{x[i] + y[i]} \right )$