69_transform_prediction_targets

Transforming Prediction Targets ¶

designed for transforming supervised learning targets (not on features).

Label Binarization ¶

This utility creates a label indicator matrix from a list of multiclass labels.
Not necessary if you are already using a method that supports label indicator matrix format.

In [1]:

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer().fit(
    [1, 2, 6, 4, 2])

print(lb.classes_,"\n")
print(lb.transform([1, 6]))

[1 2 4 6] 

[[1 0 0 0]
 [0 0 0 1]]

Multilabel Binarization ¶

Converts a collection of "label collections" and the indicator format.
Multilabel learning: the joint set of binary classification tasks is shown as an indicator array:
- Each sample is one row of a binary-valued 2D array (#samples, #classes) where ones indicate the subset of labels for that sample.
- ([[1,0,0],[0,1,1],[0,0,0]]) equals:
- label 0 in the 1st sample
- labels 1,2 in the 2nd sample
- no labels in the 3rd sample

In [2]:

from sklearn.preprocessing import MultiLabelBinarizer
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
MultiLabelBinarizer().fit_transform(y)

Out[2]:

array([[0, 0, 1, 1, 1],
       [0, 0, 1, 0, 0],
       [1, 1, 0, 1, 0],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 0, 0]])

Label Encoding ¶

A utility to normalize labels (to 0..n_classes-1). Useful for Cython routines.
It will also transform text labels to numerical equivalents, as long as they are hashable & comparable.

In [3]:

from sklearn import preprocessing
le = preprocessing.LabelEncoder().fit([1, 2, 2, 6])

print(le.classes_,"\n")
print(le.transform([1, 1, 2, 6]),"\n")
print(le.inverse_transform([0, 0, 1, 2]))

In [4]:

le = preprocessing.LabelEncoder().fit(
    ["paris", "paris", "tokyo", "amsterdam"])

print(list(le.classes_),"\n")
print(le.transform(["tokyo", "tokyo", "paris"]),"\n")
print(list(le.inverse_transform([2, 2, 1])))

['amsterdam', 'paris', 'tokyo'] 

[2 2 1] 

['tokyo', 'tokyo', 'paris']