PDPs graph the dependence between a target response and a set of input features, marginalizing over the values of all other input features (the ‘complement’ features).
We can interpret PD as a function of the input features of interest.
Due to the limits of human perception the size of the set of input feature of interest must be small (usually 1-2). The input features of interest are usually chosen among the most important ones.
PDPs with two features of interest enable us to visualize their interactions.
This two-way PD plot shows the dependence of median house price on the joint values of house age and average occupants per household.
There's an interaction between the two features: for average occupancy >2, the house price is nearly independent of the house age; there is a strong dependence on age for average occupancy >2.
features = ['AveOccup', 'HouseAge', ('AveOccup', 'HouseAge')]
tic = time()
_, ax = plt.subplots(ncols=3, figsize=(9, 4))
display = PPD(
est, X_train,
features, kind='average',
n_jobs=3, grid_resolution=20,
ax=ax)
print(f"done in {time() - tic:.3f}s")
display.figure_.suptitle(
'PD, house value on non-location features\n'
'California housing, Gradient Boosting')
display.figure_.subplots_adjust(wspace=0.4, hspace=0.3)
done in 1.018s
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
features = ('AveOccup', 'HouseAge')
pdp = PD(est, X_train,
features=features,
kind='average', grid_resolution=20)
XX, YY = np.meshgrid(pdp["values"][0],
pdp["values"][1])
Z = pdp.average[0].T
ax = Axes3D(fig)
surf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1,
cmap=plt.cm.BuPu, edgecolor='k')
ax.set_xlabel(features[0])
ax.set_ylabel(features[1])
ax.set_zlabel('Partial dependence')
# pretty init view
ax.view_init(elev=22, azim=122)
plt.colorbar(surf)
plt.suptitle('PD, house value on median\n'
'age and avg occupancy, with Gradient Boosting')
plt.subplots_adjust(top=0.9)
ICE plots can be built from the plot_partial_dependence function by using kind='individual'
.
ICE plots show the dependence between a target function and an input feature.
Unlike a PDP, which shows the average effect of the input feature, an ICE plot visualizes the dependence of the prediction on a feature for each sample separately - with one line per sample.
Due to the limits of human perception, only one input feature of interest is supported for ICE plots.
While the PDPs are good at showing the average effect of the target features, they can obscure a heterogeneous relationship created by interactions.
ICE plots will provide many more insights if interactions exist. For example, we see a linear relationship between median income and house price in the PD line. The ICE lines show that there are exceptions, where the house price remains constant in some ranges of the median income.
It might not be easy to see the average effect of the input feature in an ICE plot. Consider using ICE plots alongside PDPs. They can be plotted together with kind='both'
.
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.inspection import plot_partial_dependence
X, y = make_hastie_10_2(random_state=0)
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
max_depth=1, random_state=0).fit(X, y)
features = [0, 1]
PPD(clf, X, features, kind='individual')
PPD(clf, X, features, kind='both')
<sklearn.inspection._plot.partial_dependence.PartialDependenceDisplay at 0x7f875a87d8b0>
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split as TTS
# center the targets to avoid gradient boosting init bias
# (gradient boosting with "recursion" doesn't account for
# the initial estimator (here the avg target, by default.)
cal_housing = fetch_california_housing()
X = pd.DataFrame(cal_housing.data,
columns=cal_housing.feature_names)
y = cal_housing.target
y -= y.mean()
X_train, X_test, y_train, y_test = TTS(
X, y, test_size=0.1, random_state=0)
# 1-way PD using a multilayer perceptron (MLP)
# and gradient boosting
from time import time
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import QuantileTransformer as QT
from sklearn.neural_network import MLPRegressor as MLPR
print("Training MLPRegressor...")
tic = time()
est = make_pipeline(QT(),
MLPR(hidden_layer_sizes=(50, 50),
learning_rate_init=0.01,
early_stopping=True)).fit(X_train,
y_train)
print(f"done in {time() - tic:.3f}s")
print(f"Test R2 score: {est.score(X_test, y_test):.2f}")
Training MLPRegressor... done in 5.251s Test R2 score: 0.80
import matplotlib.pyplot as plt
from sklearn.inspection import partial_dependence as PD
from sklearn.inspection import plot_partial_dependence as PPD
tic = time()
features = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms']
display = PPD(
est, X_train, features,
kind="both", subsample=50,
n_jobs=3, grid_resolution=20, random_state=0)
print(f"done in {time() - tic:.3f}s")
display.figure_.suptitle(
'PD of house value on non-location features\n'
'Cal Housing, with MLPRegressor'
)
display.figure_.subplots_adjust(hspace=0.3)
done in 3.154s
from sklearn.experimental import enable_hist_gradient_boosting # noqa
from sklearn.ensemble import HistGradientBoostingRegressor as HGBR
tic = time()
est = HGBR().fit(X_train, y_train)
print(f"done in {time() - tic:.3f}s")
print(f"Test R2 score: {est.score(X_test, y_test):.2f}")
done in 0.565s Test R2 score: 0.85
tic = time()
display = PPD(
est, X_train, features, kind="both", subsample=50,
n_jobs=3, grid_resolution=20, random_state=0
)
print(f"done in {time() - tic:.3f}s")
display.figure_.suptitle(
'PD of house value on non-location features\n'
'Cal Housing, with Gradient Boosting'
)
display.figure_.subplots_adjust(wspace=0.4, hspace=0.3)
done in 2.210s
The PDPs (thick blue line) indicate the median house price show 1) a linear relationship with median income (top left) 2) house price drops when the average occupants per household increases (top middle). 3) house age in a district does not have a strong influence on the (median) house price; (top right) 4) neither does the average rooms per household. (2nd row.)
The ICE curves (light blue lines) complement the analysis: we can see that there are some exceptions, where the house price remain constant with median income and average occupants.
While house age (top right) does not have a strong influence on the median house price, there seems to be exceptions where the house price increase when between the ages 15-25.
Similar exceptions can be observed for the average number of rooms (bottom left). Therefore, ICE plots show some individual effect which are attenuated by taking the averages.
In all plots, the tick marks on the x-axis represent the deciles of the feature values in the training data.
MLPRegressor has much smoother predictions than HistGradientBoostingRegressor.
If these features are correlated, we are creating potential meaningless synthetic samples.
Partial dependence of a response $f$ as at a point $x_s$: $\begin{split}pd_{X_S}(xS) &\overset{def}{=} \mathbb{E}{X_C}\left[ f(x_S, X_C) \right]\
&= \int f(x_S, x_C) p(x_C) dx_C,\end{split}$
where $X_s$ = input features of interest (features
)
$f(x_S, x_C)$ = the response function
Computing this integral across $x_s$ produces the PDP plot. An ICE line is defined as a single $f(x_{S}, x_{C}^{(i)})$ evaluated at $x_s$.
method
controls the computation method.