### Multiclass & Multioutput Problems¶

• This Scikit module describes meta-estimators - which extend the functions of base estimators so they may support multi-learning problems. This happens by transforming multilearning problems into sets of simpler ones - then fitting one estimator per problem.
• Problem Types (#targets, target cardinality, target type):
• Multiclass classification: 1, >2, "multiclass"
• Multilabel classification: >1, 0|1, "multilabel-indicator"
• Multiclass, Multioutput classification: >1, >2, "multiclass-multioutput"
• Multioutput regression: >1, continuous, "continuous-multioutput"

• Many Scikit estimators already have built-in multilearning support. See User Guide chapter for complete list.

### Multiclass Classification, aka Label Binarization¶

• Defined as classification with >2 classes (eg, "orange","apple","pear"; each sample can be labeled with only 1 class. ("apple".)
• All Scikit classifiers already have built-in multiclass support. No need to use this module except for experimental purposes.

### One-vs-Rest Classification¶

• One classifier is fitted per class against all other classes.
• Advantage: intepretability (each class is represented by only one classifier).

### Multilabel Classification¶

• Also supported by the OvR classifier.
• Feed an indicator matrix to the classifier: cell[i,j] indicates the presence of label $j$ in sample $i$.

### One vs One Classification¶

• Builds one classifier per pair of classes.
• The class receiving the most votes is selected for prediction.
• Slower than OvR due to needing to fit n_classes*(n_classes-1)/2 classifiers.
• Still may be desired for some kernel algorithms that don't scale as well, because each classifier involves only a subset of the training dataset. (OvR uses the complete dataset n_classes times.)

### Output Code Classification¶

• Each class is represented in a Euclidean space - each dimension can be only 0 or 1.
• A matrix (the "code book") tracks the location/code of each class.
• Each class should be represented by a unique code - a good code book should optimize classification accuracy.
• One binary classifier is used per bit in the code book for fitting.
• The classifer predicts the class closest to new points in the class space.

### Multilabel Classification¶

• Labels each sample with m labels from n_classes possible classes.
• Can be described as predicting sample properties, which are not mutually exclusive.
• Each label is treated independently - multilabel classifiers can treat the classes simultaneously.
• Targets are represented by a dense or sparse binary matrix of (#samples,#classes).

### Multioutput Classification¶

• Fits one classifer per target - allows multiple target classifications.

### Classifier Chains¶

• Combines binary classifiers into a single multilabel model.
• Enables using correlations among targets.
• Assigns a position (0..N-1) to each of N binary classifiers in a chain.
• Each classifier is fit on the training dataset, plus the true labels of the classes whose models were lower in the chain.
• Clearly, chain order is important. The 1st model has no information about the other labels, while the last model has all available info.
• Typically many randomly ordered chains are averaged together to avoid having to guess at an optimal order.

• Labels each sample with one or more non-binary properties.
• A single estimator therefore handles multiple joint classification tasks.
• Example:
• properties: "type of fruit" ("apple","pear","orange"), "color" ("green","red","yellow","orange")

### Multioutput Regression¶

• Predicts multiple numerical properties per sample.
• Valid $y$ is a dense matrix of shape(#samples,#classes) in floating point format.

### Regressor Chaining¶

• Anagolous to Classifier Chains. Combines regressors into a single multitarget model to exploit target correlations.