### Kernel Approximations¶

• These functions return approximate feature maps that correspond to kernels for use in SVMs, for example. They perform non-linear transformations on an input - this can be the basis for linear classifications or similar algorithms.

• The advantage of using approximate feature maps (compared to the kernel trick, which uses feature maps implicitly) is that explicit mappings can be better suited for online learning. They can significantly reduce the cost of learning with very large datasets.

• Standard kernelized SVMs do not scale well to large datasets, but you can use much more efficient linear SVMs with approximate kernel maps.

• Combining kernel map approximations with SGDClassifier makes non-linear learning on large datasets possible.

### Nystroem Approximation¶

• A general-purpose method for low-rank kernel approximations. It subsamples the data on which the kernel is evaluated.

• Nystroem uses the rbf kernel by default but can use any kernel function or a precomputed kernel matrix.

• n_components is the number of samples used - which is also the dimensionality of the computed features.

### RBF Sampler¶

• RBFSampler constructs an approximate map for the radial basis function kernel, a.k.a. "Random Kitchen Sinks" RR2007.

• fit performs a Monte Carlo sampling. It takes two arguments: n_components (target dimensionality of the feature transform) and gamma the parameter of the RBF-kernel.

• Higher n_components returns a better approximation of the kernel, so will yield results more similar to a kernel SVM.

• “Fitting” the feature function does not actually depend on the data - only its dimensionality.

• transform maps the data. Results may vary between different calls to fit due to inherent randomeness.

• For a given n_components RBFSampler is often less accurate than Nystroem. RBFSampler is cheaper to compute.

### Example: Approximating an RBF kernel¶

• This shows how to use RBF Sampler and Nystroem to approximate an RBF kernel for SVM-based classification on the digits dataset.

• The results include those from a linear SVM (original space), linear SVM (approximate map) and a kernelized SVM.

• This dataset is not large enough to really show the benefits of kernel approximation - exact SVM is still reasonably fast.

• More sampling leads to better classification results, but with longer runtimes. Solving a linear SVM and approximate kernel SVM could be sped up using stochastic gradient descent via SGDClassifier.

• The second plot visualizes decision surfaces of the RBF kernel SVM and linear SVM with approximate kernel maps.

• The decision surfaces of the classifiers are projected onto the first two principal components of the data. This visualization should be viewed with skepticism - it is just a slice through the decision surface in 64 dimensions.