Preview

Informatics

Advanced search

Comparative study of quality estimation of binary classification

https://doi.org/10.37661/1816-0301-2020-17-1-87-101

Abstract

The paper describes results of analytical and experimental analysis of seventeen functions used for evaluation of binary classification results of arbitrary data. The results are presented by 2×2 error matrices. The behavior and properties of the main functions calculated by the elements of such matrices are studied.  Classification options with balanced and imbalanced datasets are analyzed. It is shown that there are linear dependencies between some functions, many functions are invariant to the transposition of the error matrix, which allows us to calculate the estimation without specifying the order in which their elements were written to the matrices.

It has been proven that all classical measures such as Sensitivity, Specificity, Precision, Accuracy, F1, F2, GM, the Jacquard index are sensitive to the imbalance of classified data and distort estimation of smaller class objects classification errors. Sensitivity to imbalance is found in the Matthews correlation coefficient and Kohen’s kappa. It has been experimentally shown that functions such as the confusion entropy, the discriminatory power, and the diagnostic odds ratio should not be used for analysis of binary classification of imbalanced datasets. The last two functions are invariant to the imbalance of classified data, but poorly evaluate results with approximately equal common percentage of classification errors in two classes.

We proved that the area under the ROC curve (AUC) and the Yuden index calculated from the binary classification confusion matrix are linearly dependent and are the best estimation functions of both balanced and imbalanced datasets.

About the Authors

V. V. Starovoitov
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Belarus
Valery V. Starovoitov, Dr. Sci. (Eng.), Professor, Chief Researcher


Yu. I. Golub
The United Institute of Informatics Problems of the National Academy of Sciences of Belarus
Yuliya I. Golub, Cand. Sci. (Eng.), Associate Professor, Senior Researcher


References

1. Zhuravlev Y. I. On the algebraic approach to solving problems of recognition and classification. Problems of cybernetics, Moscow, Nauka, 1978, vol. 33, рр. 5–68.

2. Haixiang G., Shang J., Mingyun G., Yuanyue H., Bing G. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 2017, vol. 73, рр. 220–239.

3. Choi S. S., Cha S. H., Tappert C. C. A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 2010, vol. 8(1), рр. 43–48.

4. Canbek G., Sagiroglu S., Temizel T. T., Baykal N. Binary classification performance measures/metrics: A comprehensive visualized roadmap to gain new insights. International Conference on Computer Science and Engineering, Antalya, Turkey, 5–8 October 2017. Antalya, 2017, рр. 821–826.

5. Sokolova M., Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 2009, vol. 45, no. 4, рр. 427–437.

6. Valverde-Albacete F. J., Peláez-Moreno C. 100 % classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox. PLoS One, 2014, vol. 9(1), 10 р. https://doi.org/10.1371/journal.pone.0084217

7. Powers D. M. What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes, 2015. Available at: https://arxiv.org/abs/1503.06410 (accessed 17.11.2019).

8. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, vol. 27, no. 8, рр. 861–874.

9. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, vol. 20, no. 1, рр. 37–46.

10. Matthews B. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta – Protein Structure, 1975, vol. 405, no. 2, рр. 442–451.

11. Wei J. M., Yuan X. J., Hu Q. H., Wang S. Q. A novel measure for evaluating classifiers. Expert Systems with Applications, 2010, vol. 37, no. 5, рр. 3799–3809.

12. Blakeley D. D., Oddone E. Z., Hasselblad V., Simel D. L., Matchar D. B. Noninvasive carotid artery testing: a meta-analytic review. Annals of Internal Medicine, 1995, vol. 122, no. 5, рр. 360–367.

13. Youden W. J. Index for rating diagnostic tests. Cancer, 1950, vol. 3, no. 1, рр. 32–35.

14. Glas A. S., Lijmer J. G., Prins M. H., Bonsel G. J., Bossuyt P. M. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology, 2003, vol. 56, no. 11, рр. 1129–1135.

15. Davis J., Goadrich M. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, 25–29 June 2006, Pittsburgh, Pennsylvania, USA. Pittsburgh, 2006, рр. 233–240.

16. Boughorbel S., Jarray F., El-Anbari M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS One, 2017, vol. 12(6). https://doi.org/10.1371/journal.pone.0177678

17. Jurman G., Riccadonna S., Furlanello C. A comparison of MCC and CEN error measures in multi-class prediction. PloS One, 2012, vol. 7, no. 8, e41882. https://doi.org/10.1371/journal.pone.0041882

18. Pepe M. S., Janes H., Longton G., Leisenring W., Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology, 2004, vol. 159, no. 9, рр. 882–890.

19. Mower J. P. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics, 2005, vol. 6, art. 96, рр. 1–15. https://doi.org/10.1186/1471-2105-6-96


Review

For citations:


Starovoitov V.V., Golub Yu.I. Comparative study of quality estimation of binary classification. Informatics. 2020;17(1):87-101. (In Russ.) https://doi.org/10.37661/1816-0301-2020-17-1-87-101

Views: 1347


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)