

A new class proportion will also influence the precision (but not the recall - check!), and thus the F1-score. Keep in mind that this is exactly the same classifier as before.Ī quick calculation shows that the accuracy is now a much lower (100+18)/(400+20)=28%, because cats are now the majority class. Cat lovers, rejoice! Here is the new confusion matrix. Let’s consider “cat” to be the positive class, i.e., the one we are interested in. You can put any value in the TN cell -0, 100, infinity - and the precision, recall and F1-score will not change.Īs an exercise, let’s now flip the confusion matrix. The 4th value - TN - is not used in these metrics. They use only three of the values in the confusion matrix: TP, FP, and FN. Precision and recall ( and by extension, the F1-score, which is a function of the two) consider one class, the positive class, to be the class we are interested in. Moreover, 2 of the 3 photos classified as cats are actually dogs.

Only 1 out of 4 cat photos was successfully detected. Take a look again at the matrix, specifically at the classification of cat photos. Fantastic classifier, right? Hold your horses. Thus, the recall is 18/(18+2), or 90%.įinally, the F1-score is the harmonic mean of the precision and recall. From the above matrix it is easy to see that there are 20 true positives, and 18 of them are successfully detected. The recall is the number of true positives that are correctly classified (TP/(TP+FN)). In our case, dog photos are the positive class, and 18 out of 18+3 photos that were classified as dogs actually contain dogs. As a born-again believer, I’m here to spread the gospel!ĭogs are “positive” Precision, Recall, and F1-scoreĪs a refresher, precision is the proportion of true positives out of all detected positives, or simply TP/(TP+FP). Following my “discovery”, I asked around and was surprised to find that many people in the field are not familiar with this classification metric. But a couple of weeks ago, I stumbled upon another scalar metric for binary classification: the Matthews Correlation Coefficient (MCC). Scalar metrics are ubiquitous in textbooks, web articles, online courses, and they are the metrics that most data scientists are familiar with. However, scalar metrics still remain popular among the machine-learning community with the four most common being accuracy, recall, precision, and F1-score. You’ve designed the model and fed the data now the time has finally come to measure the classifier’s performance.ĭon’t get me wrong: ROC curves are the best choice for comparing models. Congratulations! You’ve built a binary classifier -a fancy-schmancy neural network using 128 GPUs with their dedicated power station, or perhaps a robust logistic regression model that runs on your good old ThinkPad.
