上QQ阅读APP看书，第一时间看更新

Important evaluation metrics – classification algorithms

Most of the metrics used to assess a classification model are based on the values that we get in the four quadrants of a confusion matrix. Let's begin this section by understanding what it is:

Confusion matrix: It is the cornerstone of evaluating a classification model (that is, classifier). As the name stands, the matrix is sometimes confusing. Let's try to visualize the confusion matrix as two axes in a graph. The x axis label is prediction, with two values—Positive and Negative. Similarly, the y axis label is actually with the same two values—Positive and Negative, as shown in the following figure. This matrix is a table that contains the information about the count of actual and predicted values by a classifier:

If we try to deduce information about each quadrant in the matrix:
- Quadrant one is the number of positive class predictions that were accurately identified. So, it is termed as True Positive (TP).
- Quadrant two, also known as False Positive (FP), is the number of inaccurate predictions for actual positive cases.
- Quadrant three, which is known as False Negative (FN), is the number of inaccurate predictions for negative cases.
- Quadrant four is True Negative (TN), which is the number of negative class predictions that were accurately classified.
Accuracy: Accuracy measures how frequently the classifier makes an accurate prediction. It is the ratio of the number of correct predictions to the total number of predictions:

Precision: Precision estimates the proportions of true positives that were accurately identified. It is the ratio of true positives to all predicted positives:

Recall: Recall is also termed sensitivity or true positive rate (TPR). It estimates the proportions of true positives out of all observed positive values of a target:

Misclassification rate: It estimates how frequently the classifier has predicted inaccurately. It is the ratio of incorrect predictions to all predictions:

Specificity: Specificity is also known as true negative rate (TNR). It estimates the proportions of true negatives out of all observed negative values of a target:

ROC curve: The ROC curve summarizes the performance of a classifier over all possible thresholds. The graph for ROC curve is plotted with true positive rate (TPR) in the y axis and false positive rate (FPR) in the x axis for all possible thresholds.
AUC: AUC is the area under a ROC curve. If the classifier is outstanding, the true positive rate will increase, and the area under the curve will be close to 1. If the classifier is similar to random guessing, the true positive rate will increase linearly with the false positive rate (1–sensitivity). In this case, the AUC will be around 0.5. The better the AUC measure, the better the model.
Lift: Lift helps to estimate the improvement in a model's predictive ability over the average or baseline model. For example, the accuracy of the baseline model for an HR attrition dataset is 40%, but the accuracy of a new model on the same dataset is 80%. Then, that model has a lift of 2 (80/40).
Balanced accuracy: Sometimes, the accuracy is not a good measure alone to evaluate a model. For cases where the dataset is unbalanced, it might not be a useful evaluation metric. In such cases, balanced accuracy can be used as one of the evaluation metrics. Balanced accuracy is a measure calculated on the average accuracy obtained in either class:

Unbalanced dataset—Where one class dominates the other class. In such cases, there is an inherent bias in prediction towards the major class. However, this is a problem with base learners such as decision trees and logistic regression. For ensemble models such as random forest it can handle unbalanced classes well.

F1 score: An F1 score is also a sound measure to estimate an imbalanced classifier. The F1 score is the harmonic mean of precision and recall. Its value lies between 0 and 1:

Hamming loss: This identifies the fraction of labels that are incorrectly predicted.
Matthews Correlation Coefficient (MCC): MCC is a correlation coefficient between target and predictions. It varies between -1 and +1. -1 when there is complete disagreement between actuals and prediction, 1 when there is a perfect agreement between actuals and predictions, 0 when the prediction may as well be random concerning the actuals. As it involves values of all the four quadrants of a confusion matrix, it is considered as a balanced measure.

Sometimes, creating a model for prediction is not only a requirement. We need insights on how the model was built and the critical features that describe the model. Decision trees are go to model in such cases.