### Partial Dependence Plot (PDP)¶

• PDPs graph the dependence between a target response and a set of input features, marginalizing over the values of all other input features (the ‘complement’ features).

• We can interpret PD as a function of the input features of interest.

• Due to the limits of human perception the size of the set of input feature of interest must be small (usually 1-2). The input features of interest are usually chosen among the most important ones.

### PDPs - 2D interactions¶

• PDPs with two features of interest enable us to visualize their interactions.

• This two-way PD plot shows the dependence of median house price on the joint values of house age and average occupants per household.

• There's an interaction between the two features: for average occupancy >2, the house price is nearly independent of the house age; there is a strong dependence on age for average occupancy >2.

### Individual Conditional Expectation (ICE) Plot¶

• ICE plots can be built from the plot_partial_dependence function by using kind='individual'.

• ICE plots show the dependence between a target function and an input feature.

• Unlike a PDP, which shows the average effect of the input feature, an ICE plot visualizes the dependence of the prediction on a feature for each sample separately - with one line per sample.

• Due to the limits of human perception, only one input feature of interest is supported for ICE plots.

• While the PDPs are good at showing the average effect of the target features, they can obscure a heterogeneous relationship created by interactions.

• ICE plots will provide many more insights if interactions exist. For example, we see a linear relationship between median income and house price in the PD line. The ICE lines show that there are exceptions, where the house price remains constant in some ranges of the median income.

• It might not be easy to see the average effect of the input feature in an ICE plot. Consider using ICE plots alongside PDPs. They can be plotted together with kind='both'.

### PDP & ICE Plot Observations:¶

• The PDPs (thick blue line) indicate the median house price show 1) a linear relationship with median income (top left) 2) house price drops when the average occupants per household increases (top middle). 3) house age in a district does not have a strong influence on the (median) house price; (top right) 4) neither does the average rooms per household. (2nd row.)

• The ICE curves (light blue lines) complement the analysis: we can see that there are some exceptions, where the house price remain constant with median income and average occupants.

• While house age (top right) does not have a strong influence on the median house price, there seems to be exceptions where the house price increase when between the ages 15-25.

• Similar exceptions can be observed for the average number of rooms (bottom left). Therefore, ICE plots show some individual effect which are attenuated by taking the averages.

• In all plots, the tick marks on the x-axis represent the deciles of the feature values in the training data.

• MLPRegressor has much smoother predictions than HistGradientBoostingRegressor.

• If these features are correlated, we are creating potential meaningless synthetic samples.

### Computation¶

• Partial dependence of a response $f$ as at a point $x_s$: $\begin{split}pd_{X_S}(xS) &\overset{def}{=} \mathbb{E}{X_C}\left[ f(x_S, X_C) \right]\  &= \int f(x_S, x_C) p(x_C) dx_C,\end{split}$
• where $X_s$ = input features of interest (features)

• $X_c$ = its complement
• $f(x_S, x_C)$ = the response function

• Computing this integral across $x_s$ produces the PDP plot. An ICE line is defined as a single $f(x_{S}, x_{C}^{(i)})$ evaluated at $x_s$.

• method controls the computation method.

• 'brute' is the generic method (note: ICE only supports this variant).
• 'recursion' is faster but only supported by some tree-based estimators for PDPs.