<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">inform</journal-id><journal-title-group><journal-title xml:lang="ru">Информатика</journal-title><trans-title-group xml:lang="en"><trans-title>Informatics</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1816-0301</issn><issn pub-type="epub">2617-6963</issn><publisher><publisher-name>UIIP NASB</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.37661/1816-0301-2020-17-1-87-101</article-id><article-id custom-type="elpub" pub-id-type="custom">inform-1044</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION</subject></subj-group></article-categories><title-group><article-title>Сравнительный анализ оценок качества  бинарной классификации</article-title><trans-title-group xml:lang="en"><trans-title>Comparative study of quality estimation of binary classification</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Старовойтов</surname><given-names>В. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Starovoitov</surname><given-names>V. V.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Старовойтов Валерий Васильевич – доктор технических наук, профессор, главный научный сотрудник</p></bio><bio xml:lang="en"><p>Valery V. Starovoitov, Dr. Sci. (Eng.), Professor, Chief Researcher</p></bio><email xlink:type="simple">valerys@newman.bas-net.by</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Голуб</surname><given-names>Ю. И.</given-names></name><name name-style="western" xml:lang="en"><surname>Golub</surname><given-names>Yu. I.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Голуб Юлия Игоревна – кандидат технических наук, доцент, старший научный сотрудник</p></bio><bio xml:lang="en"><p>Yuliya I. Golub, Cand. Sci. (Eng.), Associate Professor, Senior Researcher</p></bio><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Объединенный институт проблем информатики  Национальной академии наук Беларуси</institution></aff><aff xml:lang="en"><institution>The United Institute of Informatics Problems of the National Academy of Sciences of Belarus</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2020</year></pub-date><pub-date pub-type="epub"><day>18</day><month>02</month><year>2020</year></pub-date><volume>17</volume><issue>1</issue><fpage>87</fpage><lpage>101</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Старовойтов В.В., Голуб Ю.И., 2020</copyright-statement><copyright-year>2020</copyright-year><copyright-holder xml:lang="ru">Старовойтов В.В., Голуб Ю.И.</copyright-holder><copyright-holder xml:lang="en">Starovoitov V.V., Golub Y.I.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://inf.grid.by/jour/article/view/1044">https://inf.grid.by/jour/article/view/1044</self-uri><abstract><p>Приведены данные аналитического и экспериментального анализов 17 функций, используемых для оценки результатов бинарной классификации произвольных данных. Результаты классификации представлены матрицами ошибок размером 2×2. Исследованы поведение и свойства основных функций, вычисляемых по элементам этих матриц. Анализируются варианты классификации со сбалансированными и несбалансированными классами данных. Показано, что между отдельными функциями существуют линейные зависимости. Многие функции инвариантны к транспонированию матриц ошибок, что позволяет вычислять оценки, не уточняя порядок записи данных в эти матрицы.</p><p>Доказано, что все классические функции (Sensitivity, Specificity, Precision, Accuracy, F1, F2, GM, индекс Жаккара) чувствительны к дисбалансу классифицируемых данных и искажают оценки при ошибках классификации объектов меньшего класса. Чувствительность к дисбалансу имеется у коэффициента корреляции Мэтьюса и каппы Коэна. Экспериментально показано, что такие функции, как энтропия ошибки (confusion entropy), степень разделимости (discriminatory power) и диагностическое отношение шансов (diagnostic odds ratio), не стоит использовать для анализа результатов бинарной классификации несбалансированных классов. Две последние функции инвариантны к дисбалансу классифицируемых данных, но плохо оценивают результаты с примерно равным суммарным процентом ошибок классификации. Доказано, что площадь под ROC-кривой (AUC) и индекс Юдена, вычисляемые по матрице ошибок бинарной классификации, линейно зависимы и являются наиболее подходящими оценочными функциями для сравнения результатов бинарной классификации как сбалансированных, так и несбалансированных данных.</p></abstract><trans-abstract xml:lang="en"><p>The paper describes results of analytical and experimental analysis of seventeen functions used for evaluation of binary classification results of arbitrary data. The results are presented by 2×2 error matrices. The behavior and properties of the main functions calculated by the elements of such matrices are studied.  Classification options with balanced and imbalanced datasets are analyzed. It is shown that there are linear dependencies between some functions, many functions are invariant to the transposition of the error matrix, which allows us to calculate the estimation without specifying the order in which their elements were written to the matrices.</p><p>It has been proven that all classical measures such as Sensitivity, Specificity, Precision, Accuracy, F1, F2, GM, the Jacquard index are sensitive to the imbalance of classified data and distort estimation of smaller class objects classification errors. Sensitivity to imbalance is found in the Matthews correlation coefficient and Kohen’s kappa. It has been experimentally shown that functions such as the confusion entropy, the discriminatory power, and the diagnostic odds ratio should not be used for analysis of binary classification of imbalanced datasets. The last two functions are invariant to the imbalance of classified data, but poorly evaluate results with approximately equal common percentage of classification errors in two classes.</p><p>We proved that the area under the ROC curve (AUC) and the Yuden index calculated from the binary classification confusion matrix are linearly dependent and are the best estimation functions of both balanced and imbalanced datasets.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>бинарная классификация</kwd><kwd>матрица ошибок</kwd><kwd>функции точности классификации</kwd><kwd>площадь под ROC-кривой</kwd><kwd>индекс Юдена</kwd></kwd-group><kwd-group xml:lang="en"><kwd>binary classification</kwd><kwd>confusion matrix</kwd><kwd>functions of Accuracy classification</kwd><kwd>area under ROC curve</kwd><kwd>Youden’s index</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Zhuravlev Y. I. On the algebraic approach to solving problems of recognition and classification. Problems of cybernetics, Moscow, Nauka, 1978, vol. 33, рр. 5–68.</mixed-citation><mixed-citation xml:lang="en">Zhuravlev Y. I. On the algebraic approach to solving problems of recognition and classification. Problems of cybernetics, Moscow, Nauka, 1978, vol. 33, рр. 5–68.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Haixiang G., Shang J., Mingyun G., Yuanyue H., Bing G. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 2017, vol. 73, рр. 220–239.</mixed-citation><mixed-citation xml:lang="en">Haixiang G., Shang J., Mingyun G., Yuanyue H., Bing G. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 2017, vol. 73, рр. 220–239.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Choi S. S., Cha S. H., Tappert C. C. A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 2010, vol. 8(1), рр. 43–48.</mixed-citation><mixed-citation xml:lang="en">Choi S. S., Cha S. H., Tappert C. C. A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 2010, vol. 8(1), рр. 43–48.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Canbek G., Sagiroglu S., Temizel T. T., Baykal N. Binary classification performance measures/metrics: A comprehensive visualized roadmap to gain new insights. International Conference on Computer Science and Engineering, Antalya, Turkey, 5–8 October 2017. Antalya, 2017, рр. 821–826.</mixed-citation><mixed-citation xml:lang="en">Canbek G., Sagiroglu S., Temizel T. T., Baykal N. Binary classification performance measures/metrics: A comprehensive visualized roadmap to gain new insights. International Conference on Computer Science and Engineering, Antalya, Turkey, 5–8 October 2017. Antalya, 2017, рр. 821–826.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Sokolova M., Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing &amp; Management, 2009, vol. 45, no. 4, рр. 427–437.</mixed-citation><mixed-citation xml:lang="en">Sokolova M., Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing &amp; Management, 2009, vol. 45, no. 4, рр. 427–437.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Valverde-Albacete F. J., Peláez-Moreno C. 100 % classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox. PLoS One, 2014, vol. 9(1), 10 р. https://doi.org/10.1371/journal.pone.0084217</mixed-citation><mixed-citation xml:lang="en">Valverde-Albacete F. J., Peláez-Moreno C. 100 % classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox. PLoS One, 2014, vol. 9(1), 10 р.  https://doi.org/10.1371/journal.pone.0084217</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Powers D. M. What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes, 2015. Available at: https://arxiv.org/abs/1503.06410 (accessed 17.11.2019).</mixed-citation><mixed-citation xml:lang="en">Powers D. M. What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes, 2015. Available at: https://arxiv.org/abs/1503.06410 (accessed 17.11.2019).</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, vol. 27, no. 8, рр. 861–874.</mixed-citation><mixed-citation xml:lang="en">Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, vol. 27, no. 8, рр. 861–874.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, vol. 20, no. 1, рр. 37–46.</mixed-citation><mixed-citation xml:lang="en">Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, vol. 20, no. 1, рр. 37–46.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Matthews B. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta – Protein Structure, 1975, vol. 405, no. 2, рр. 442–451.</mixed-citation><mixed-citation xml:lang="en">Matthews B. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta – Protein Structure, 1975, vol. 405, no. 2, рр. 442–451.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Wei J. M., Yuan X. J., Hu Q. H., Wang S. Q. A novel measure for evaluating classifiers. Expert Systems with Applications, 2010, vol. 37, no. 5, рр. 3799–3809.</mixed-citation><mixed-citation xml:lang="en">Wei J. M., Yuan X. J., Hu Q. H., Wang S. Q. A novel measure for evaluating classifiers. Expert Systems with Applications, 2010, vol. 37, no. 5, рр. 3799–3809.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Blakeley D. D., Oddone E. Z., Hasselblad V., Simel D. L., Matchar D. B. Noninvasive carotid artery testing: a meta-analytic review. Annals of Internal Medicine, 1995, vol. 122, no. 5, рр. 360–367.</mixed-citation><mixed-citation xml:lang="en">Blakeley D. D., Oddone E. Z., Hasselblad V., Simel D. L., Matchar D. B. Noninvasive carotid artery testing: a meta-analytic review. Annals of Internal Medicine, 1995, vol. 122, no. 5, рр. 360–367.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Youden W. J. Index for rating diagnostic tests. Cancer, 1950, vol. 3, no. 1, рр. 32–35.</mixed-citation><mixed-citation xml:lang="en">Youden W. J. Index for rating diagnostic tests. Cancer, 1950, vol. 3, no. 1, рр. 32–35.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Glas A. S., Lijmer J. G., Prins M. H., Bonsel G. J., Bossuyt P. M. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology, 2003, vol. 56, no. 11, рр. 1129–1135.</mixed-citation><mixed-citation xml:lang="en">Glas A. S., Lijmer J. G., Prins M. H., Bonsel G. J., Bossuyt P. M. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology, 2003, vol. 56, no. 11, рр. 1129–1135.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Davis J., Goadrich M. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, 25–29 June 2006, Pittsburgh, Pennsylvania, USA. Pittsburgh, 2006, рр. 233–240.</mixed-citation><mixed-citation xml:lang="en">Davis J., Goadrich M. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, 25–29 June 2006, Pittsburgh, Pennsylvania, USA. Pittsburgh, 2006, рр. 233–240.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Boughorbel S., Jarray F., El-Anbari M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS One, 2017, vol. 12(6). https://doi.org/10.1371/journal.pone.0177678</mixed-citation><mixed-citation xml:lang="en">Boughorbel S., Jarray F., El-Anbari M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS One, 2017, vol. 12(6). https://doi.org/10.1371/journal.pone.0177678</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Jurman G., Riccadonna S., Furlanello C. A comparison of MCC and CEN error measures in multi-class prediction. PloS One, 2012, vol. 7, no. 8, e41882. https://doi.org/10.1371/journal.pone.0041882</mixed-citation><mixed-citation xml:lang="en">Jurman G., Riccadonna S., Furlanello C. A comparison of MCC and CEN error measures in multi-class prediction. PloS One, 2012, vol. 7, no. 8, e41882. https://doi.org/10.1371/journal.pone.0041882</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Pepe M. S., Janes H., Longton G., Leisenring W., Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology, 2004, vol. 159, no. 9, рр. 882–890.</mixed-citation><mixed-citation xml:lang="en">Pepe M. S., Janes H., Longton G., Leisenring W., Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology, 2004, vol. 159, no. 9, рр. 882–890.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Mower J. P. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics, 2005, vol. 6, art. 96, рр. 1–15. https://doi.org/10.1186/1471-2105-6-96</mixed-citation><mixed-citation xml:lang="en">Mower J. P. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics, 2005, vol. 6, art. 96, рр. 1–15. https://doi.org/10.1186/1471-2105-6-96</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
