### Dimensionality Reduction - Random Projections¶

• random_projection trades accuracy (as additional variance) for faster processing times & smaller model sizes.

• It uses unstructured random matrices (RMs) to do this. Two variants:

• They are constrained to preserve the pairwise distances between any two samples.

• References:

### The Johnson-Lindenstrauss lemma¶

• The lemma: "a small set of points in a high-D space can be embedded into a low-D space with that distances between points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.

• For a Lipschitz continuous function, there exists a double cone (white) whose origin can be moved along the graph so that the whole graph always stays outside the double cone: • johnson_lindenstrauss_min_dim estimates the minimal size of the random subspace needed to guarantee a bounded distortion introduced by the random projection - knowing only the #samples.

### Gaussian RP¶

• Reduces dimensionality by projecting original data on a randomly generated matrix with components drawn from $N(0, \frac{1}{n_{components}})$

### Sparse RP¶

• Reduces dimensionality by projecting original data to a sparse random matrix.

• Sparse random matrices are an alternative to dense Gaussian random projection matrices. They guarantee similar embedding quality while being much more memory efficient and allowing faster computation.

• Given s = 1/density, the random matrix elements are drawn from $\begin{split}\left\{ \begin{array}{c c l} -\sqrt{\frac{s}{n_{\text{components}}}} & & 1 / 2s\\ 0 &\text{with probability} & 1 - 1 / s \\ +\sqrt{\frac{s}{n_{\text{components}}}} & & 1 / 2s\\ \end{array} \right.\end{split}$

where n_components is the projected subspace size. Minimum non-zero element density is defined by $1 / \sqrt{n_{\text{features}}}$.

### Example: JL embedding boundary with random projections¶

• The distortion introduced by a random projection p is asserted by the fact that p is defining an eps-embedding with good probability as defined by $(1 - eps) \|u - v\|^2 < \|p(u) - p(v)\|^2 < (1 + eps) \|u - v\|^2$

• $u$ and $v$ are any rows taken from a dataset of shape (n_samples, n_features).

• $p$ is a projection by a random Gaussian N(0,1) matrix of shape (n_components, n_features) (or a sparse Achlioptas matrix).

• The minimum number of components to guarantee the embedding is given by $n\_components \geq 4 log(n\_samples) / (eps^2 / 2 - eps^3 / 3)$.

• First plot: with increasing #n_samples, the minimal number of dimensions #n_components increases logarithmically to guarantee an eps-embedding.

• Second plot: as admissable distortion eps increases, the min #dimensions (n_components) needed for a given n_samples drastically decreases.

### Empirical Validation¶

• For low n_components: the distribution has many distorted pairs and a skewed distribution (due to the hard limit of zero ratio on the left as distances are always positives)

• For larger n_components: the distortion is controlled and the distances are well preserved by the random projection.

• According to the JL lemma, projecting 500 samples without too much distortion will require several thousand dimensions, irrespective of the number of features of the original dataset.

• Using random projections on the digits dataset (only 64 features) does not make sense: it does not allow for dimensionality reduction in this case.

• On the twenty newsgroups on the other hand the dimensionality can be decreased from 56436 down to 10000 while reasonably preserving pairwise distances.