These functions return approximate feature maps that correspond to kernels for use in SVMs, for example. They perform non-linear transformations on an input - this can be the basis for linear classifications or similar algorithms.
The advantage of using approximate feature maps (compared to the kernel trick, which uses feature maps implicitly) is that explicit mappings can be better suited for online learning. They can significantly reduce the cost of learning with very large datasets.
Standard kernelized SVMs do not scale well to large datasets, but you can use much more efficient linear SVMs with approximate kernel maps.
Combining kernel map approximations with SGDClassifier makes non-linear learning on large datasets possible.
A general-purpose method for low-rank kernel approximations. It subsamples the data on which the kernel is evaluated.
Nystroem uses the rbf
kernel by default but can use any kernel function or a precomputed kernel matrix.
n_components
is the number of samples used - which is also the dimensionality of the computed features.
RBFSampler
constructs an approximate map for the radial basis function kernel, a.k.a. "Random Kitchen Sinks" RR2007.
fit
performs a Monte Carlo sampling. It takes two arguments: n_components
(target dimensionality of the feature transform) and gamma
the parameter of the RBF-kernel.
Higher n_components
returns a better approximation of the kernel, so will yield results more similar to a kernel SVM.
“Fitting” the feature function does not actually depend on the data - only its dimensionality.
transform
maps the data. Results may vary between different calls to fit
due to inherent randomeness.
For a given n_components
RBFSampler is often less accurate than Nystroem. RBFSampler is cheaper to compute.
from sklearn.kernel_approximation import RBFSampler
from sklearn.linear_model import SGDClassifier
X = [[0, 0], [1, 1], [1, 0], [0, 1]]
y = [0, 0, 1, 1]
X_features = RBFSampler( gamma=1, random_state=1).fit_transform(X)
clf = SGDClassifier(max_iter=50).fit(X_features, y)
clf.score(X_features, y)
1.0
This shows how to use RBF Sampler
and Nystroem
to approximate an RBF kernel for SVM-based classification on the digits dataset.
The results include those from a linear SVM (original space), linear SVM (approximate map) and a kernelized SVM.
This dataset is not large enough to really show the benefits of kernel approximation - exact SVM is still reasonably fast.
More sampling leads to better classification results, but with longer runtimes. Solving a linear SVM and approximate kernel SVM could be sped up using stochastic gradient descent via SGDClassifier
.
import matplotlib.pyplot as plt
import numpy as np
from time import time
from sklearn import datasets, svm
from sklearn.pipeline import Pipeline as Pipe
from sklearn.kernel_approximation import (RBFSampler, Nystroem)
from sklearn.decomposition import PCA
digits = datasets.load_digits(n_class=9)
n_samples = len(digits.data)
data = digits.data / 16.
data -= data.mean(axis=0) #flatten the images
data_train, targets_train = (data[ :n_samples // 2],
digits.target[:n_samples // 2])
data_test, targets_test = (data[ n_samples // 2:],
digits.target[ n_samples // 2:])
kernel_svm = svm.SVC(gamma=.2)
linear_svm = svm.LinearSVC()
feature_map_fourier = RBFSampler(gamma=.2, random_state=1)
feature_map_nystroem = Nystroem( gamma=.2, random_state=1)
fourier_approx_svm = Pipe([("feature_map", feature_map_fourier),
("svm", svm.LinearSVC())])
nystroem_approx_svm = Pipe([("feature_map", feature_map_nystroem),
("svm", svm.LinearSVC())])
kernel_svm_time = time()
kernel_svm.fit(data_train, targets_train)
kernel_svm_score = kernel_svm.score(data_test, targets_test)
kernel_svm_time = time() - kernel_svm_time
linear_svm_time = time()
linear_svm.fit(data_train, targets_train)
linear_svm_score = linear_svm.score(data_test, targets_test)
linear_svm_time = time() - linear_svm_time
sample_sizes = 30 * np.arange(1, 10)
fourier_scores = []
nystroem_scores = []
fourier_times = []
nystroem_times = []
for D in sample_sizes:
fourier_approx_svm.set_params(feature_map__n_components=D)
nystroem_approx_svm.set_params(feature_map__n_components=D)
start = time()
nystroem_approx_svm.fit(data_train, targets_train)
nystroem_times.append(time() - start)
start = time()
fourier_approx_svm.fit(data_train, targets_train)
fourier_times.append(time() - start)
fourier_score = fourier_approx_svm.score(data_test, targets_test)
nystroem_score = nystroem_approx_svm.score(data_test, targets_test)
nystroem_scores.append(nystroem_score)
fourier_scores.append(fourier_score)
plt.figure(figsize=(16, 8))
accuracy = plt.subplot(121)
# second y axis for timings
timescale = plt.subplot(122)
accuracy.plot(sample_sizes, nystroem_scores, label="Nystroem approx. kernel")
timescale.plot(sample_sizes, nystroem_times, '--',
label='Nystroem approx. kernel')
accuracy.plot(sample_sizes, fourier_scores, label="Fourier approx. kernel")
timescale.plot(sample_sizes, fourier_times, '--',
label='Fourier approx. kernel')
# horizontal lines for exact rbf and linear kernels:
accuracy.plot([sample_sizes[0], sample_sizes[-1]],
[linear_svm_score, linear_svm_score], label="linear svm")
timescale.plot([sample_sizes[0], sample_sizes[-1]],
[linear_svm_time, linear_svm_time], '--', label='linear svm')
accuracy.plot([sample_sizes[0], sample_sizes[-1]],
[kernel_svm_score, kernel_svm_score], label="rbf svm")
timescale.plot([sample_sizes[0], sample_sizes[-1]],
[kernel_svm_time, kernel_svm_time], '--', label='rbf svm')
# vertical line for dataset dimensionality = 64
accuracy.plot([64, 64], [0.7, 1], label="n_features")
# legends and labels
accuracy.set_title("Classification accuracy")
timescale.set_title("Training times")
accuracy.set_xlim(sample_sizes[0], sample_sizes[-1])
accuracy.set_xticks(())
accuracy.set_ylim(np.min(fourier_scores), 1)
timescale.set_xlabel("Sampling steps = transformed feature dimension")
accuracy.set_ylabel("Classification accuracy")
timescale.set_ylabel("Training time in seconds")
accuracy.legend(loc='best')
timescale.legend(loc='best')
plt.tight_layout()
The second plot visualizes decision surfaces of the RBF kernel SVM and linear SVM with approximate kernel maps.
The decision surfaces of the classifiers are projected onto the first two principal components of the data. This visualization should be viewed with skepticism - it is just a slice through the decision surface in 64 dimensions.
pca = PCA(n_components=8).fit(data_train)
X = pca.transform(data_train)
# Generate grid along first two principal components
multiples = np.arange(-2, 2, 0.1)
# steps along first & second components - then combine
first = multiples[:, np.newaxis] * pca.components_[0, :]
second = multiples[:, np.newaxis] * pca.components_[1, :]
grid = first[np.newaxis,:,:] + second[:,np.newaxis,:]
flat_grid = grid.reshape(-1, data.shape[1])
# title for the plots
titles = ['SVC with rbf kernel',
'SVC (linear kernel)\n with Fourier rbf feature map\n'
'n_components=100',
'SVC (linear kernel)\n with Nystroem rbf feature map\n'
'n_components=100']
plt.figure(figsize=(18, 7.5))
plt.rcParams.update({'font.size': 14})
# predict and plot
for i, clf in enumerate((kernel_svm,
nystroem_approx_svm,
fourier_approx_svm)):
# Plot decision boundary. Assign color to each point in the mesh
plt.subplot(1, 3, i + 1)
Z = clf.predict(flat_grid)
Z = Z.reshape(grid.shape[:-1])
plt.contourf(multiples, multiples, Z, cmap=plt.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=targets_train, cmap=plt.cm.Paired,
edgecolors=(0, 0, 0))
plt.title(titles[i])
plt.tight_layout()