### make_classification¶

• Generates random n-class data.
• Initially creates normally distributed clusters about the vertices of an n_informative-dimensional hypercube with sides of length=2*class_sep.
• Assigns equal number of clusters to each class.
• Insert interdependence between features and adds further noise.

### Make Gaussian Quantiles¶

• Divides a single Gaussian cluster into near-equal sizes, separated by concentric hyperspheres.
• Built by taking a multi-dimensional standard normal distribution and defining classes separated by nested concentric multi-dimensional spheres such that roughly equal numbers of samples are in each class (quantiles of the distribution).

### Make Circles¶

• Builds a 2D binary classification dataset - Gaussian data, spherical decision boundary.

### Make Moons¶

• Builds a 2D binary classification dataset - two interleaving half circles.

### Make Multilabel Classification¶

• Generates random samples with labels reflecting a bag of words drawn from a topic mixture. Topics for each document are drawn using a Poisson curve; topics themselves drawn from fixed random distribution.

### Make Hastie classification data¶

• The ten features are standard independent Gaussians.
• y[i] = 1 if np.sum(X[i] ** 2) > 9.34 else -1

### Make BiClusters¶

• Creates an array with a constant block diagonal structure.

### Make Checkerboard¶

• Creates an array with block checkerboard structures.

### Make Regression¶

• Produces regression targets as a random linear combination of features - with noise. Optionally sparse.

### Make Sparse Uncorrelated¶

• Returns a random regression with sparse data. Only the first 4 features are informative; the rest are useless.

### Make Friedman1¶

• Generate a non-linear regression with polynomial & sine transform components. X are independent features uniformly distributed on [0,1]. n_features must be >=5; they are used to compute $y$. All other features are independent.
• $y(X) = 10 * sin(pi * X[:, 0] * X[:, 1]) + 20 * (X[:, 2] - 0.5) ** 2 + 10 * X[:, 3] + 5 * X[:, 4] + noise * N(0, 1).$

### Make Friedman2¶

• Includes feature multiplication & reciprocation.
• $y(X) = (X[:, 0] ** 2 + (X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X[:, 3])) ** 2) ** 0.5 + noise * N(0, 1).$

### Make Friedman3¶

• $y(X) = arctan((X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X[:, 3])) / X[:, 0]) + noise * N(0, 1).$

### Make S Curve¶

• Returns X (ndarray of (#samples,3)) - the points
• Returns t (ndarray of (#samples) - sample univariate position according to the main dimension of the points.

### Make Swiss Roll¶

• Generates a swiss roll dataset.

### Make Low Rank Matrix¶

• Generates a mostly low-rank matrix with bell-shaped singular values.
• Most variance is explained by bell curve of width=effective_rank.
• The low rank portion of the profile is (1-tail_strength)exp(-1(i/effective_rank**2)
• The remaining singular values' tail is (tail_strengthexp(-0.1i/effective_rank)
• The low-rank portion of the profile can be considered as the structured signal; the tail can be considered as noise that cannot be summarized by a low number of linear components (singular vectors).

### Make Sparse Coded Signal¶

• Generates a matrix Y=DX
• Y (the encoded signal): an ndarray of (#features,#samples)
• D (the dictionary with normalized components): an ndarray of (#features,#components)
• X (the sparse code - each column has n_nonzero_coefs non-zero items): an ndarray of (#components, #samples)

### Make Sparse SPD Matrix¶

• Params: dim (size); alpha (probability of a coefficient being zero); norm_diag (whether to normalize outputs to make leading diagonal elements to all = 1); smallest_coef (0..1); largest_coef (0..1), random_state)

### Example: SPD inverse covariance estimates¶

• Use Graphical Lasso to learn covariance & sparse precision from a small #samples

• To estimate a probabilistic (eg, Gaussian) model, estimating the precision (inverse covariance) matrix is as important as estimating the covariance matrix. Indeed a Gaussian model is parametrized by the precision matrix.

• To be in favorable recovery conditions, we sample the data from a model with a sparse inverse covariance matrix. In addition, we ensure that the data is not too much correlated (limiting the largest coefficient of the precision matrix) and that there a no small coefficients in the precision matrix that cannot be recovered.

• The #samples is slightly larger than #dimensions - thus empirical covariance is still invertible. However, the observations are strongly correlated - so the empirical covariance matrix is ill-conditioned. As a result its inverse –the empirical precision matrix– is very far from the ground truth.

• If we use l2 shrinkage, as with the Ledoit-Wolf estimator, as the number of samples is small, we need to shrink a lot. As a result, the Ledoit-Wolf precision is fairly close to the ground truth precision, that is not far from being diagonal, but the off-diagonal structure is lost.

• The l1-penalized estimator can recover part of this off-diagonal structure. It learns a sparse precision. It cannot recover the exact sparsity pattern: it detects too many non-zero coefficients. However, the highest non-zero coefficients of the l1 estimated correspond to the non-zero coefficients in the ground truth.

• The coefficients of the l1 precision estimate are biased toward zero: because of the penalty, they are all smaller than the corresponding ground truth value, as can be seen on the figure.

• The color range of the precision matrices is tweaked to improve readability of the figure. The full range of values of the empirical precision is not displayed.

• GraphicalLasso alpha (sparsity) param is set by internal cross-validation.