### Support Vector Machines (SVMs)¶

• A set of supervised learning methods for classification, regression & outlier detection.
• SVMs build one or more hyperplanes that separate data into classifications, ideally with a maximal distance from the hyperplane to each training sample in the dataset.
• Effective in high-dimensional data problems.
• Memory-efficient: it uses a training data subset.
• Flexible: can be adapted to use different kernel functions, both standard & custom.
• If #features >> #samples, SVMs can be prone to overfitting. The proper kernel function & regularization term is key.
• SVM need (expensive) cross-validaton techniques to provide probability estimates.
• Scikit-Learn implementation supports both dense & sparse datatypes. For optimal performance, use C-ordered numpy.ndarray or scipy.sparse.csr_matrix data with dtype=float64.

### SVM Classification¶

• Variants: SVC, NuSVC, LinearSVC.
• SVC & NuSVC have slightly different math backgrounds & parameter sets
• LinearSVC is faster, and only supports linear kernels.
• All 3 variants accept a training set X (#samples,#features) and labels y (#samples - strings or integers)

### Example: plot max margin separating hyperplane¶

• For a 2-class dataset
• Use SVC, linear kernel

### Example: binary classification, non-linear (RBF kernel)¶

• Plot the learned decision function. The goal is to predict the XOR of the inputs.

### Example: SVM - univariate feature selection¶

• How to do univariate feature selection before running an SVC to improve classification scores.
• Iris dataset (4 features) + 36 non-informative features.
• Expect to find best scores when we select ~10% of the features.

### Multiclass classification (SVC, NuSVC)¶

• SVC & NuSVC use one-vs-one approach.
• #classes*(#classes-1)/2 classifiers are built; each trains itself with data from two classes.
• you can use decision_function_shape to transform the results of a OvO classifier into a OvR function of shape (#samples,#classes).
• LinearSVC uses a one-vs-rest approach. OvR returns coef_ (#classes,#features) and intercept_ (#classes) attributes.

### Scoring & Probabilities¶

• The decision_function method returns per-class scoring for each sample.
• If the constructor option probability is True, class membership probabilities are enabled.
• Binary classificaton: probabilities are calculated using Platt scaling (logistic regression on SVM scoring, fit with an additional cross-validation step.)
• Note: Platt's method known to have theoretical issues. Therefore advisable to set probability=False and use decision_function instead of predict_proba.

### Weighted Classes¶

• Use class_weight and sample_weight params when you need to assign more importance to selected classes.
• SVC implements class_weight when fitting. It's a dictionary of {class_label : value}, where value is a floating point number > 0. It sets the C of class_label to C*value.

### Weighted Samples¶

• SVC/SVR, NuSVC/NuSVR, LinearSVC/SVR, OneClassSVM all support individual sample weights during fitting via the sample_weight param.
• It sets C for the ith sample to C*sample_weight[i].

### Example - weighted samples¶

• To emphasize the effect we prioritize outliers. This makes the decision boundary deformation very visible.

### Support Vector Regression¶

• SVR, NuSVR & LinearSVR implementations supported.
• LinearSVR is faster than SVR, but only supports a linear kernel.
• NuSVR has a slightly different implementation than SVR

### Example: SVR with linear, polynomial & RBF kernels¶

• Using a toy 1D dataset