coefs_
is a list of weight matrices. The $ith$ matrix is the weights between layer $i$ and $i+1$.intercepts_
is a list of bias vectors. The $ith$ vector is the biases added to layer $i+1$.partial_fit
.from sklearn.neural_network import MLPClassifier as MLPC
X,y = [[0., 0.], [1., 1.]], [0, 1]
clf = MLPC(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)
print(clf.predict([[2., 2.], [-1., -2.]]))
[1 0]
clf_coefs_
contains the weight matrix.[coef.shape for coef in clf.coefs_]
[(2, 5), (5, 2), (2, 1)]
cross-entropy
loss function which returns a vector of probability estimates $P(y|x)$ (via predict_proba
).clf.predict_proba([[2., 2.], [1., 2.]])
array([[1.96718015e-04, 9.99803282e-01], [1.96718015e-04, 9.99803282e-01]])
X,y = [[0., 0.], [1., 1.]], [[0, 1], [1, 1]]
clf = MLPC(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(15,), random_state=1)
clf.fit(X, y)
print(clf.predict([[1., 2.]]))
print(clf.predict([[0., 0.]]))
[[1 1]] [[0 1]]
learning_rate_init
.import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier as MLPC
from sklearn.preprocessing import MinMaxScaler as MMS
from sklearn import datasets
from sklearn.exceptions import ConvergenceWarning
# different learning rate schedules and momentum parameters
params = [{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0,
'learning_rate_init': 0.2},
{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
'nesterovs_momentum': False, 'learning_rate_init': 0.2},
{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
'nesterovs_momentum': True, 'learning_rate_init': 0.2},
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0,
'learning_rate_init': 0.2},
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
'nesterovs_momentum': True, 'learning_rate_init': 0.2},
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
'nesterovs_momentum': False, 'learning_rate_init': 0.2},
{'solver': 'adam', 'learning_rate_init': 0.01}]
labels = ["constant",
"constant/momentum",
"constant/momentum/nesterov",
"inv-scaling",
"inv-scaling/momentum",
"inv-scaling/momentum/nesterov",
"adam"]
plot_args = [{'c': 'red', 'linestyle': '-'},
{'c': 'green', 'linestyle': '-'},
{'c': 'blue', 'linestyle': '-'},
{'c': 'red', 'linestyle': '--'},
{'c': 'green', 'linestyle': '--'},
{'c': 'blue', 'linestyle': '--'},
{'c': 'black', 'linestyle': '-'}]
def plot_on_dataset(X, y, ax, name):
ax.set_title(name)
X = MMS().fit_transform(X)
mlps = []
if name == "digits": # digits is larger but converges fairly quickly
max_iter = 15
else:
max_iter = 400
for label, param in zip(labels, params):
mlp = MLPC(random_state=0, max_iter=max_iter, **param)
# some combinations will not converge as can be seen on the
# plots so they are ignored here
'''
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=ConvergenceWarning, module="sklearn")
'''
mlp.fit(X, y)
mlps.append(mlp)
print("Training score: %f" % mlp.score(X, y))
print("Training loss: %f" % mlp.loss_)
for mlp, label, args in zip(mlps, labels, plot_args):
ax.plot(mlp.loss_curve_, label=label, **args)
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
iris = datasets.load_iris()
X_digits, y_digits = datasets.load_digits(return_X_y=True)
data_sets = [(iris.data, iris.target),
(X_digits, y_digits),
datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
datasets.make_moons(noise=0.3, random_state=0)]
for ax, data, name in zip(axes.ravel(), data_sets, ['iris', 'digits',
'circles', 'moons']):
plot_on_dataset(*data, ax=ax, name=name)
fig.legend(ax.get_lines(), labels, ncol=3, loc="upper center")
plt.show()
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.980000 Training loss: 0.096950 Training score: 0.980000 Training loss: 0.049530 Training score: 0.980000 Training loss: 0.049540 Training score: 0.360000 Training loss: 0.978444 Training score: 0.860000 Training loss: 0.503452
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn( /home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.860000 Training loss: 0.504185 Training score: 0.980000 Training loss: 0.045311
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.956038 Training loss: 0.243802
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.992766 Training loss: 0.041297
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.993879 Training loss: 0.042898
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.638843 Training loss: 1.855465
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.912632 Training loss: 0.290584
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.909293 Training loss: 0.318387
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (15) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.991653 Training loss: 0.045934
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.840000 Training loss: 0.601052
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.940000 Training loss: 0.157334
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.940000 Training loss: 0.154453 Training score: 0.500000 Training loss: 0.692470 Training score: 0.500000 Training loss: 0.689143 Training score: 0.500000 Training loss: 0.689751 Training score: 0.940000 Training loss: 0.150527 Training score: 0.850000 Training loss: 0.341523 Training score: 0.850000 Training loss: 0.336188 Training score: 0.850000 Training loss: 0.335919 Training score: 0.500000 Training loss: 0.689015
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.830000 Training loss: 0.512595
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.830000 Training loss: 0.513034
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (400) reached and the optimization hasn't converged yet. warnings.warn(
Training score: 0.930000 Training loss: 0.170087
from sklearn.neural_network import MLPRegressor as MLPR
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split as TTS
X, y = make_regression(n_samples=200, random_state=1)
X_train, X_test, y_train, y_test = TTS(X, y, random_state=1)
regr = MLPR(random_state=1, max_iter=5000).fit(X_train, y_train)
print(regr.predict(X_test[:2]))
print(regr.score(X_test, y_test))
[8.69448846 6.5006531 ] 0.5209157440819883
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap as LCM
from sklearn.model_selection import train_test_split as TTS
from sklearn.preprocessing import StandardScaler as SS
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier as MLPC
from sklearn.pipeline import make_pipeline
h = .02 # step size in the mesh
alphas = np.logspace(-1, 1, 5)
classifiers = []
names = []
for alpha in alphas:
classifiers.append(make_pipeline(
SS(),
MLPC(
solver='lbfgs', alpha=alpha, random_state=1, max_iter=2000,
early_stopping=True, hidden_layer_sizes=[100, 100],
)
))
names.append(f"alpha {alpha:.2f}")
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=0, n_clusters_per_class=1)
rng = np.random.RandomState(2)
X += 2 * rng.uniform(size=X.shape)
linearly_separable = (X, y)
datasets = [make_moons(noise=0.3, random_state=0),
make_circles(noise=0.2, factor=0.5, random_state=1),
linearly_separable]
figure = plt.figure(figsize=(17, 9))
i = 1
for X, y in datasets:
X_train, X_test, y_train, y_test = TTS(X, y, test_size=.4)
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# just plot the dataset first
cm = plt.cm.RdBu
cm_bright = LCM(['#FF0000', '#0000FF'])
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
# training & testing points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6)
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
i += 1
# iterate over classifiers
for name, clf in zip(names, classifiers):
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
# Plot the decision boundary.
if hasattr(clf, "decision_function"):
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
# Put the result into a color plot
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)
# training & testing points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors='black', s=25)
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors='black', s=25)
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(name)
ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
size=15, horizontalalignment='right')
i += 1
figure.subplots_adjust(left=.02, right=.98)
plt.show()
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:500: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
MLP training uses SGD (Wikipedia), Adam (arxiv) or L-BFGS (Wikipedia).
SGD updates parameters using the gradient of the loss function w.r.t. a parameter that needs adaptation: $w \leftarrow w - \eta (\alpha \frac{\partial R(w)}{\partial w}
\frac{\partial Loss}{\partial w})$, where $\eta$ is the learning rate & $Loss$ is the loss function.
SGD with Momentum (aka nesterov
) can perform better than Adam or L-BFGS if the learning rate is correctly tuned.
Adam is similar to SGD, and can automatically adjust the parameter update amounts. It is very robust on large datasets and usually converges very quickly.
L-BFGS approximates a Hessian matrix of the 2nd-order partial derivative of a function. It also approximates a Hessian matrix inverse to do the parameter updates. The implementation is based on SciPy. It converges quickly on smaller datasets.
SGD & Adam support both online & minibatch learning; L-BFGS does not.
GridSearchCV
.warm_start=True
and max_iter=1
if you need additional control of stopping or the learning rate, and/or you need additional monitoring.X = [[0., 0.], [1., 1.]]
y = [0, 1]
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(hidden_layer_sizes=(15,), random_state=1, max_iter=1, warm_start=True)
for i in range(10):
clf.fit(X, y)
# additional monitoring / inspection
/home/bjpcjp/.local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (1) reached and the optimization hasn't converged yet. warnings.warn(