All posts, sorted by date (oldest first)
What is an ROC curve? What is AUC?
* A ROC curve = the false positive rate of a model plotted against its true positive rate.
* A completely random prediction will be a straight diagonal. The optimal model will be as close to the axes as possible.
* AUC (Area Under Curve) = a measure how close the ROC curve is to the axes. Higher AUC indicates a higher accuracy.
What is PCA?
* Principal Component Analysis, is a method of dimension reduction - finds n orthogonal vectors that represent the most variance in the data, where n is the dimensions the user wants the data reduced to.
* PCA can speed up jobs or can be used to visualize high-dimensional data.
Explain the bias-variance tradeoff
* Bias is a model error due to an oversimplified ML algorithm -- which can lead to underfitting.
* When you train your model at that time model makes simplified assumptions to make the target function easier to understand.
* Low-bias algos: decision trees, KNN, and SVM.
* High-bias algos: linear and logistic regression.
* Variance is a model due an overly complex ML algorithm -- the model learns noise from the training data set, hence performing badly on test data. It can lead to high sensitivity and overfitting.
* Normally, as you increase the complexity of your model, you will see a reduction in error due to lower bias in the model. However, this only happens until a particular point — as you continue to make your model more complex, you end up over-fitting your model.
Why is Softmax often the last operation in a neural network?
* Because it accepts a vector of real numbers and returns a probability distribution. Each element is non-negative and the sum over all components is 1.
What is TF/IDF vectorization?
* Term frequency-inverse document frequency reflects how important a word is to a document in a corpus. It is used as a weighting factor in information retrieval and text mining.
* TF–IDF increases proportionally to the number of times a word appears in the document but decreases proportionally by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
Compare different types of selection biases
* Sampling bias is a systematic error due to a non-random sampling of a population.
* This causes some members of the population to be less included than others, such as low-income families being excluded from an online poll.
* Time interval bias is when a trial may be terminated early at an extreme value (usually for ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all variables have a similar mean.
* Data bias is when specific subsets of data are chosen to support a conclusion or rejection of bad data on arbitrary grounds, instead of according to a previously stated or generally agreed on criteria.
* Attrition bias is caused by loss of participants discounting trial subjects that did not run to completion.
Define Error Rate, Accuracy, Sensitivity/Recall, Specificity, Precision, and F-Score.
Where T is True, F is False, P is Positive, and N is Negative, each denoting the number of items in a confusion matrix.
* Error Rate: (FP + FN) / (P + N)
* Accuracy: (TP + TN) / (P + N)
* Sensitivity/Recall: TP / P
* Specificity: TN / N
* Precision: TP / (TP + FP)
* F-Score: Harmonic mean of precision and recall.
Compare correlation and covariance
* Correlation measures & estimates the relationship between two variables, and measures how strongly two variables are related.
* Covariance measures the extent to which two random variables change in tandem.
Why is A/B testing effective?
* A/B testing is hypothesis testing for a randomized experiment with two variables A and B.
* It is effective because it minimizes conscious bias — those in group A do not know that they are in group A, or that there even is a group B, and vice versa.
* However, A/B testing is difficult to perform on any context other than Internet businesses.
Random Numbers: How would you generate a random number between 1 and 7 with only one die?
* One solution is to roll the die twice. This means there are 6 x 6 = 36 possible outcomes. By excluding one combination (say, 6 and 6), there are 35 possible outcomes.
* Therefore if we assign five combinations of rolls (order does matter!) to one number, we can generate a random number between 1 and 7.
* For instance, say we roll a (1, 2). Since we have (hypothetically) defined the roll combinations (1, 1), (1, 2), (1, 3), (1, 4), and (1, 5) to the number 1, the randomly generated number would be 1.
Compare univariate, bivariate, and multivariate analaysis.
* Univariate analyses are performed on only one variable. Examples: pie charts, distribution plots, and boxplots.
* Bivariate analysis map relationships between two variables. Examples: scatterplots or contour plots, as well as time series forecasting.
* Multivariate analysis deals with more than two variables to understand the effect of those variable on a target variable. This can include training neural networks for predictions or SHAP values/permutation importance to find the most important feature. It could also include scatterplots with a third feature like color or size.
What is cross-validation?
* Cross validation measure how well a model generalizes to an entire dataset. A traditional train-test-split method, in which part of the data is randomly selected to be training data and the other fraction test data, may mean that the model performs well on certain randomly selected fractions of test data and poorly on other randomly selected test data.
* In other words, the performance is not nearly indicative of the model’s performance as it is of the randomness of the test data.
* Cross validation splits the data into n segments. The model is trained on n-1 segments of the data and is tested on the remaining segment of data. Then, the model is refreshed and trained on a different set of n-1 segments of data. This repeats until the model has predicted values for the entire data (of which the results are averaged).
What does the ‘naive’ in ‘Naive Bayes’ mean?
* Naive Bayes is based on Bayes’ Theorem, which describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It is considered to be ‘naive’ because it makes assumptions that may or may not be correct. This is why it can be very powerful when used correctly — it can bypass knowledge other models must find because it assumes that it is true.
What are the different kernels in SVM?
Radial Basis Kernel
Recommenders: Compare collaborative filtering, content filtering, and hybrid filtering.
* Collaborative filtering solely relies on user ratings to determine what a new user might like next. All product attributes are either learned through user interactions or discarded. One example of collaborative filtering is matrix factorization.
* Content filtering relies only on intrinsic attributes of products and customers, such as product price, customer age, etc., to make recommendations. One way to achieve content filtering is to measure similarity between a profile vector and an item vector, such as cosine similarity.
* Hybrid filtering combines content and collaborative filtering recommendations. Which filter to use depends on the real-world context — hybrid filtering may not always be the definitive answer.
Memory: You have 5GB RAM & need to train your model on a 10 GB dataset. How do you do this?
* SVM: a partial fit would work. The dataset could be split into several smaller-size datasets. Because SVM is a low-computational cost algorithm, it may be the best case in this scenario.
* If the data is not suitable for SVM, a Neural Network with a small batch size could be trained on a compressed NumPy array. NumPy has several tools for compressing large datasets, which are integrated into common neural network packages like Keras/TensorFlow and PyTorch.
What is the consequence of not setting an accurate learning rate?
If the learning rate it too low, the training of the model will progress very slowly, as the weights are making minimal updates. However, if the learning rate is set too high, this may cause the loss function to jump erratically due to drastic updates in weights. The model may also fail to converge to an error or may even diverge in the case that the data is too chaotic for the network to train.
Validation: Compare test sets & validation sets
* A test set is used to evaluate a model’s performance after training.
* A validation set is used during training for parameter selection and to prevent overfitting on the training set.
Devops Favorites - March 2020
28 Mar 2020
Papers with Code
08 Apr 2020