This utility creates a label indicator matrix from a list of multiclass labels.
Not necessary if you are already using a method that supports label indicator matrix format.
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer().fit(
[1, 2, 6, 4, 2])
print(lb.classes_,"\n")
print(lb.transform([1, 6]))
[1 2 4 6] [[1 0 0 0] [0 0 0 1]]
Converts a collection of "label collections" and the indicator format.
Multilabel learning: the joint set of binary classification tasks is shown as an indicator array:
Each sample is one row of a binary-valued 2D array (#samples, #classes) where ones indicate the subset of labels for that sample.
([[1,0,0],[0,1,1],[0,0,0]])
equals:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
MultiLabelBinarizer().fit_transform(y)
array([[0, 0, 1, 1, 1], [0, 0, 1, 0, 0], [1, 1, 0, 1, 0], [1, 1, 1, 1, 1], [1, 1, 1, 0, 0]])
A utility to normalize labels (to 0..n_classes-1). Useful for Cython routines.
It will also transform text labels to numerical equivalents, as long as they are hashable & comparable.
from sklearn import preprocessing
le = preprocessing.LabelEncoder().fit([1, 2, 2, 6])
print(le.classes_,"\n")
print(le.transform([1, 1, 2, 6]),"\n")
print(le.inverse_transform([0, 0, 1, 2]))
[1 2 6] [0 0 1 2] [1 1 2 6]
le = preprocessing.LabelEncoder().fit(
["paris", "paris", "tokyo", "amsterdam"])
print(list(le.classes_),"\n")
print(le.transform(["tokyo", "tokyo", "paris"]),"\n")
print(list(le.inverse_transform([2, 2, 1])))
['amsterdam', 'paris', 'tokyo'] [2 2 1] ['tokyo', 'tokyo', 'paris']