When doing supervised learning, compare your estimator against a simple example as a sanity test. DummyClassifier provides several strategies for this.
stratified: generates random predictions by respecting the training set class distribution.
most_frequent: always predicts the most frequent label in the training set.
prior: always predicts the class that maximizes the class prior (like most_frequent) and predict_proba returns the class prior.
uniform: generates predictions uniformly at random.
constant always predicts a constant user-specified label.
predict method completely ignores the input data.
# test unbalanced dataset from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split as TTS X, y = load_iris(return_X_y=True) y[y != 1] = -1 X_train, X_test, y_train, y_test = TTS(X, y, random_state=0)
# compare SVC & most_frequent accuracy from sklearn.dummy import DummyClassifier as DC from sklearn.svm import SVC clf1 = SVC(kernel='linear', C=1).fit(X_train, y_train) clf2 = DC(strategy='most_frequent', random_state=0).fit(X_train, y_train) print(clf1.score(X_test, y_test)) print(clf2.score(X_test, y_test))
clf3 = SVC(kernel='rbf', C=1).fit(X_train, y_train) print(clf3.score(X_test, y_test))
DummyRegressor also implements four rules of thumb for regression:
mean: predicts the mean of the training targets.
median: predicts the median of the training targets.
quantile: predicts a user provided quantile of the training targets.
constant: predicts a constant user-specified value.
import numpy as np from sklearn.dummy import DummyRegressor as DR X = np.array([1.0, 2.0, 3.0, 4.0]) y = np.array([2.0, 3.0, 5.0, 10.0]) dummy_regr = DR(strategy="mean").fit(X, y) print(dummy_regr.predict(X), dummy_regr.score(X, y))
[5. 5. 5. 5.] 0.0