<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">inform</journal-id><journal-title-group><journal-title xml:lang="ru">Информатика</journal-title><trans-title-group xml:lang="en"><trans-title>Informatics</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1816-0301</issn><issn pub-type="epub">2617-6963</issn><publisher><publisher-name>UIIP NASB</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">inform-358</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ИНТЕЛЛЕКТУАЛЬНЫЕ СИСТЕМЫ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>INTELLIGENT SYSTEMS</subject></subj-group></article-categories><title-group><article-title>АВТОМАТИЧЕСКОЕ ОПРЕДЕЛЕНИЕ ЯЗЫКА ТЕКСТОВОГО ДОКУМЕНТА ДЛЯ ОСНОВНЫХ ЕВРОПЕЙСКИХ ЯЗЫКОВ</article-title><trans-title-group xml:lang="en"><trans-title></trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Крапивин</surname><given-names>Ю. Б.</given-names></name></name-alternatives><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff xml:lang="ru" id="aff-1"><institution>Брестский государственный технический университет</institution><country>Belarus</country></aff><pub-date pub-type="collection"><year>2011</year></pub-date><pub-date pub-type="epub"><day>19</day><month>04</month><year>2018</year></pub-date><volume>0</volume><issue>3(31)</issue><fpage>112</fpage><lpage>117</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Крапивин Ю.Б., 2018</copyright-statement><copyright-year>2018</copyright-year><copyright-holder xml:lang="ru">Крапивин Ю.Б.</copyright-holder><copyright-holder xml:lang="en">Крапивин Ю.Б.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://inf.grid.by/jour/article/view/358">https://inf.grid.by/jour/article/view/358</self-uri><abstract><p>Проводится анализ основных методов решения задачи автоматического определения языка текстового документа и предлагается алгоритм, основанный на комбинировании алфавитного метода, метода грамматических слов и алфавитно-триграммного метода, сочетающий в себе возможности минимального статистического и лингвистического анализа языковых данных и обеспечивающий эффективное решение указанной задачи.</p></abstract></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Крапивин, Ю.Б. К задаче автоматического распознавания воспроизведенных фрагментов текстовых документов / Ю.Б. Крапивин // Вестник БрГТУ : Физика, математика, информатика. – 2009. – № 5 (59). – С. 120–123.</mixed-citation><mixed-citation xml:lang="en">Крапивин, Ю.Б. К задаче автоматического распознавания воспроизведенных фрагментов текстовых документов / Ю.Б. Крапивин // Вестник БрГТУ : Физика, математика, информатика. – 2009. – № 5 (59). – С. 120–123.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Grefenstette, G. Comparing two language identification schemes / G. Grefenstette // The</mixed-citation><mixed-citation xml:lang="en">Grefenstette, G. Comparing two language identification schemes / G. Grefenstette // The</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Third Intern. Conf. on Statistical Analysis of Textual Data. – Rome, 1995.</mixed-citation><mixed-citation xml:lang="en">Third Intern. Conf. on Statistical Analysis of Textual Data. – Rome, 1995.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Sibun, P. Language Determination: Natural Language Processing from Scanned Document</mixed-citation><mixed-citation xml:lang="en">Sibun, P. Language Determination: Natural Language Processing from Scanned Document</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Images / P. Sibun, A.L. Spitz // Proc. of the 4th ACL Conf. on Applied Natural Language Proceeding (ANLP). – Stuttgart, Germany, 1994.</mixed-citation><mixed-citation xml:lang="en">Images / P. Sibun, A.L. Spitz // Proc. of the 4th ACL Conf. on Applied Natural Language Proceeding (ANLP). – Stuttgart, Germany, 1994.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Cowie, J. Language recognition for mono- and multilingual documents / J. Cowie,</mixed-citation><mixed-citation xml:lang="en">Cowie, J. Language recognition for mono- and multilingual documents / J. Cowie,</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Y. Ludovic, R. Zacharski // Proc. of the Vextal Conference. – Venice, 1999.</mixed-citation><mixed-citation xml:lang="en">Y. Ludovic, R. Zacharski // Proc. of the Vextal Conference. – Venice, 1999.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Natural Language Identification using Corpus-based Models / C. Souter [et al.] // Hermes</mixed-citation><mixed-citation xml:lang="en">Natural Language Identification using Corpus-based Models / C. Souter [et al.] // Hermes</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Journal of Linguistics. – 1994. – № 13. – P. 183–203.</mixed-citation><mixed-citation xml:lang="en">Journal of Linguistics. – 1994. – № 13. – P. 183–203.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Cavnar, W.B. N-Gram-Based Text Categorization / W.B. Cavnar, J.M. Trenkle // Proc. of the</mixed-citation><mixed-citation xml:lang="en">Cavnar, W.B. N-Gram-Based Text Categorization / W.B. Cavnar, J.M. Trenkle // Proc. of the</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR). – Las Vegas, 1994. – P. 161–175.</mixed-citation><mixed-citation xml:lang="en">rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR). – Las Vegas, 1994. – P. 161–175.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Prager, J.M. Linguini: Language identification for multilingual documents / J.M. Prager //</mixed-citation><mixed-citation xml:lang="en">Prager, J.M. Linguini: Language identification for multilingual documents / J.M. Prager //</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Proc. of the 32nd Hawaii Intern. Conf. on System Sciences. – Maui, Hawaii, USA, 1999.</mixed-citation><mixed-citation xml:lang="en">Proc. of the 32nd Hawaii Intern. Conf. on System Sciences. – Maui, Hawaii, USA, 1999.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Dunning, T. Statistical Identification of Language / T. Dunning // Computing Research Laboratory.</mixed-citation><mixed-citation xml:lang="en">Dunning, T. Statistical Identification of Language / T. Dunning // Computing Research Laboratory.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Technical report MCCS. – New Mexico State University, 1994. – P. 94–273.</mixed-citation><mixed-citation xml:lang="en">Technical report MCCS. – New Mexico State University, 1994. – P. 94–273.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Sibun, P. Language identification: Examing the issues / P. Sibun, J.C. Reynar // Proc. of the</mixed-citation><mixed-citation xml:lang="en">Sibun, P. Language identification: Examing the issues / P. Sibun, J.C. Reynar // Proc. of the</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">th Symposium on Document Analysis and Information Retrieval (SDAIR). – Las Vegas, 1996. – P. 125–135.</mixed-citation><mixed-citation xml:lang="en">th Symposium on Document Analysis and Information Retrieval (SDAIR). – Las Vegas, 1996. – P. 125–135.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Poutsma, A. Applying MonteCarlo Techniques to Language Identification / A. Poutsma //</mixed-citation><mixed-citation xml:lang="en">Poutsma, A. Applying MonteCarlo Techniques to Language Identification / A. Poutsma //</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Proc. of Computational Linguistics in the Netherlands. – Amsterdam, Netherlands, 2001.</mixed-citation><mixed-citation xml:lang="en">Proc. of Computational Linguistics in the Netherlands. – Amsterdam, Netherlands, 2001.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Biemann, C. Disentangling from Babylonian Confusion – Unsupervised Language Identification / C. Biemann, S. Teresniak // Proc. of the CICLing-2005. – Mexico City, 2005.</mixed-citation><mixed-citation xml:lang="en">Biemann, C. Disentangling from Babylonian Confusion – Unsupervised Language Identification / C. Biemann, S. Teresniak // Proc. of the CICLing-2005. – Mexico City, 2005.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Kruengkrai, C. Language Identification Based on String Kernels / C. Kruengkrai // Proc. of</mixed-citation><mixed-citation xml:lang="en">Kruengkrai, C. Language Identification Based on String Kernels / C. Kruengkrai // Proc. of</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">the 5th Intern. Symposium on Communications and Information Technologies (ISCIT-2005). – Beijing, China, 2005.</mixed-citation><mixed-citation xml:lang="en">the 5th Intern. Symposium on Communications and Information Technologies (ISCIT-2005). – Beijing, China, 2005.</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Giguet, E. Categorization according to Language: A step toward combining Linguistic</mixed-citation><mixed-citation xml:lang="en">Giguet, E. Categorization according to Language: A step toward combining Linguistic</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Knowledge and Statistic Learning / E. Giguet // 4th Intern. Workshop of Parsing Technologies. – Prague, Karlovy Vary, Czech Republic, 1995.</mixed-citation><mixed-citation xml:lang="en">Knowledge and Statistic Learning / E. Giguet // 4th Intern. Workshop of Parsing Technologies. – Prague, Karlovy Vary, Czech Republic, 1995.</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">Newman, P. Foreign language identification: First step in the translation process / P. Newman // Proc. of the 28th Annual Conf. of the American Translators Accociation. – Albuquerque NM, USA, 1987. – P. 509–516.</mixed-citation><mixed-citation xml:lang="en">Newman, P. Foreign language identification: First step in the translation process / P. Newman // Proc. of the 28th Annual Conf. of the American Translators Accociation. – Albuquerque NM, USA, 1987. – P. 509–516.</mixed-citation></citation-alternatives></ref><ref id="cit26"><label>26</label><citation-alternatives><mixed-citation xml:lang="ru">Kullback – Leibler_divergence [Electronic resource] // Wikipedia. – Mode of access :</mixed-citation><mixed-citation xml:lang="en">Kullback – Leibler_divergence [Electronic resource] // Wikipedia. – Mode of access :</mixed-citation></citation-alternatives></ref><ref id="cit27"><label>27</label><citation-alternatives><mixed-citation xml:lang="ru">http://en.wikipedia.org/wiki/ Kullback-Leibler_divergence. – Date of access : 15.12.2010.</mixed-citation><mixed-citation xml:lang="en">http://en.wikipedia.org/wiki/ Kullback-Leibler_divergence. – Date of access : 15.12.2010.</mixed-citation></citation-alternatives></ref><ref id="cit28"><label>28</label><citation-alternatives><mixed-citation xml:lang="ru">Ukkonen, E. On-line construction of suffix trees / E. Ukkonen // Algorithmica. – 1995. –</mixed-citation><mixed-citation xml:lang="en">Ukkonen, E. On-line construction of suffix trees / E. Ukkonen // Algorithmica. – 1995. –</mixed-citation></citation-alternatives></ref><ref id="cit29"><label>29</label><citation-alternatives><mixed-citation xml:lang="ru">№ 14 (3). – P. 249–260.</mixed-citation><mixed-citation xml:lang="en">№ 14 (3). – P. 249–260.</mixed-citation></citation-alternatives></ref><ref id="cit30"><label>30</label><citation-alternatives><mixed-citation xml:lang="ru">Function word [Electronic resource] // Wikipedia. – Mode of access : http://en.wikipedia.org/wiki/Function_word. – Date of access : 14.12.2010.</mixed-citation><mixed-citation xml:lang="en">Function word [Electronic resource] // Wikipedia. – Mode of access : http://en.wikipedia.org/wiki/Function_word. – Date of access : 14.12.2010.</mixed-citation></citation-alternatives></ref><ref id="cit31"><label>31</label><citation-alternatives><mixed-citation xml:lang="ru">Quasthoff, U. Corpus Portal for Search in Monolingual Corpora / U. Quasthoff, M. Richter,</mixed-citation><mixed-citation xml:lang="en">Quasthoff, U. Corpus Portal for Search in Monolingual Corpora / U. Quasthoff, M. Richter,</mixed-citation></citation-alternatives></ref><ref id="cit32"><label>32</label><citation-alternatives><mixed-citation xml:lang="ru">C. Biemann // Proc. of the Fifth Intern. Conf. on Language Resources and Evaluation, LREC 2006. – Genoa, 2006. – P. 1799–1802.</mixed-citation><mixed-citation xml:lang="en">C. Biemann // Proc. of the Fifth Intern. Conf. on Language Resources and Evaluation, LREC 2006. – Genoa, 2006. – P. 1799–1802.</mixed-citation></citation-alternatives></ref><ref id="cit33"><label>33</label><citation-alternatives><mixed-citation xml:lang="ru">Lancaster-Oslo-Bergen Corpus [Electronic resource] // Wikipedia. – Mode of access :</mixed-citation><mixed-citation xml:lang="en">Lancaster-Oslo-Bergen Corpus [Electronic resource] // Wikipedia. – Mode of access :</mixed-citation></citation-alternatives></ref><ref id="cit34"><label>34</label><citation-alternatives><mixed-citation xml:lang="ru">http://en.wikipedia.org/wiki/LOB_Corpus. – Date of access : 15.12.2010.</mixed-citation><mixed-citation xml:lang="en">http://en.wikipedia.org/wiki/LOB_Corpus. – Date of access : 15.12.2010.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
