In this note, we discuss principal components regression and some of the issues with it:

The need for scaling. The need for pruning. The lack of “y-awareness” of the standard dimensionality reduction step.

The purpose of this article is to set the stage for presenting dimensionality reduction techniques appropriate for predictive modeling, such as y-aware principal components analysis, variable pruning, L2-regularized regression, supervised PCR, or partial least squares. We do this by working detailed examples and building the relevant graphs. In our follow-up article we describe and demonstrate the idea of y-aware scaling.

Note we will try to say “principal components” (plural) throughout, following Everitt’s The Cambridge Dictionary of Statistics, though this is not the only common spelling (e.g. Wikipedia: Principal component regression). We will work all of our examples in R.