When doing supervised learning, compare your estimator against a simple example as a sanity test.

*DummyClassifier*provides several strategies for this.`stratified`

: generates random predictions by respecting the training set class distribution.`most_frequent`

: always predicts the most frequent label in the training set.`prior`

: always predicts the class that maximizes the class prior (like most_frequent) and predict_proba returns the class prior.`uniform`

: generates predictions uniformly at random.`constant`

always predicts a constant user-specified label.

Note: the

`predict`

method completely ignores the input data.

In [1]:

```
# test unbalanced dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split as TTS
X, y = load_iris(return_X_y=True)
y[y != 1] = -1
X_train, X_test, y_train, y_test = TTS(X, y, random_state=0)
```

In [2]:

```
# compare SVC & most_frequent accuracy
from sklearn.dummy import DummyClassifier as DC
from sklearn.svm import SVC
clf1 = SVC(kernel='linear',
C=1).fit(X_train, y_train)
clf2 = DC(strategy='most_frequent',
random_state=0).fit(X_train, y_train)
print(clf1.score(X_test, y_test))
print(clf2.score(X_test, y_test))
```

0.631578947368421 0.5789473684210527

- SVC doesn’t do much better than a dummy classifier. Change the kernel and re-run:

In [4]:

```
clf3 = SVC(kernel='rbf', C=1).fit(X_train, y_train)
print(clf3.score(X_test, y_test))
```

0.9473684210526315

**DummyRegressor**also implements four rules of thumb for regression:`mean`

: predicts the mean of the training targets.`median`

: predicts the median of the training targets.`quantile`

: predicts a user provided quantile of the training targets.`constant`

: predicts a constant user-specified value.

In [6]:

```
import numpy as np
from sklearn.dummy import DummyRegressor as DR
X = np.array([1.0, 2.0, 3.0, 4.0])
y = np.array([2.0, 3.0, 5.0, 10.0])
dummy_regr = DR(strategy="mean").fit(X, y)
print(dummy_regr.predict(X),
dummy_regr.score(X, y))
```

[5. 5. 5. 5.] 0.0