Development of algorithms and software for classification of nucleotide sequences

V. R. Zakirava; D. A. Syrakvash; S. V. Hileuski; P. V. Nazarov; M. M. Yatskou

Development of algorithms and software for classification of nucleotide sequences

V. R. Zakirava, D. A. Syrakvash, S. V. Hileuski, P. V. Nazarov, M. M. Yatskou

Full Text:

PDF (Rus) |

Generate QR code

Abstract

Coding and non-coding nucleotide sequences of the human reference genome have been investigated. Seven models of vectorization of nucleotide sequences based on mono-, bi-, trigram nucleotide frequencies, parameters of the category-position-frequency model, the lengths of sequences, nucleotide correlation factors, statistical features of coding and non-coding regions of DNA molecules were developed. The most informative features of vectorization models were determined using feature selection and classification algorithms based on the random forests and support vector machine methods. The difference between coding and non-coding fragments of nucleotide sequences was established. An error of the coding and non-coding sequences classification using the random forests method on a set of the 23 most informative features is 2,93 %.

Keywords

DNA, exon, intron, classification, Random Forests, Support Vector Machine, feature selection, R programming

For citations:

Zakirava V.R., Syrakvash D.A., Hileuski S.V., Nazarov P.V., Yatskou M.M. Development of algorithms and software for classification of nucleotide sequences. Informatics. 2019;16(2):109-118. (In Russ.)

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)

Informatics

Development of algorithms and software for classification of nucleotide sequences

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Username
Password
	Remember me

User

Informatics

Development of algorithms and software for classification of nucleotide sequences

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy