Preview

Informatics

Advanced search

A computational approach and software package RNAexploreR for grouping RNA molecules of human genes by exon features

Abstract

The study on the exon combinatoric rules of human genes during the process of splicing is of great interest for the diagnosis and treatment of cancer. A certain part of the research is aimed at developing reliable prediction models for global exon combinatorics during the formation of mature RNA. The primary task is to develop standards or uniform systematic statistical approaches to the analysis and interpretation of possible exon sequences of genes.

A computational approach is proposed to group alternative splicing events in primary messenger RNA of human genes with the aim of determining the gene correspondence or molecule class. The method consists of reducing the dimension of the exon feature space and combining closely located exons into a limited number of classes, replacing the exon pathways of RNA generation with sequences of corresponding exon class labels, calculating the distances between RNA transcripts by some measure of similarity, and associating closely spaced RNA objects into clusters. The performance evaluation of developed algorithms has been done using the examples of RNA molecules of selected nonhomologous human genes and human hybrid oncogene RUNX1/RUNX1T1. The mean accuracy of the assignment of the transcript to given gene is about 99,5 % for the considered nonhomologous pairs of genes.

A software package and web application RNAexploreR, integrating the implemented algorithms for the analysis of alternative splicing of human gene RNA products, have been developed. The proposed algorithms and software can be used to study the organization and functioning of both aberrant and normal human genes.

About the Authors

M. M. Yatskou
https://www.bsu.by/main.aspx?guid=104591
Belarusian State University
Belarus

Mikalai M. Yatskou - Cand. Sci. (Phys.-Math.), Associate Professor, Department of Systems Analysis and Computer Modelling, Faculty of Radiophysics and Computer Technologies.

Minsk



V. V. Skakun
Belarusian State University
Belarus

Victor V. Skakun - Cand. Sci. (Phys.-Math.), Associate Professor, Head of Department of Systems Analysis and Computer Modelling, Faculty of Radiophysics and Computer Technologies.

Minsk



V. V. Grinev
Belarusian State University
Belarus

Vasily V. Grinev - Cand. Sci. (Biol.), Associate Professor, Department of Genetics, Faculty of Biology.

Minsk



References

1. Baralle F. E., Giudice J. Alternative splicing as a regulator of development and tissue identity. Nature Reviews Molecular Cell Biology, 2017, vol. 18, pp. 437-451.

2. Nilsen T. W., Graveley B. R. Expansion of the eukaryotic proteome by alternative splicing. Nature, 2010, vol. 463, pp. 457-463.

3. Ramanouskaya T. V., Grinev V. V. The determinants of alternative RNA splicing in human cells. Molecular Genetics and Genomics, 2015, vol. 292, pp. 1175-1195.

4. Dominguez D., Freese P., Alexis M. S., Su A., Hochman M., ..., Burge C. B. Sequence, structure, and context preferences of human RNA binding proteins. Molecular Cell, 2018, vol. 70, pp. 854-867.

5. Ilyushonak I. M., Gunko E. P., Antonovich M. L., Yatskou M. M., Kustanovich A. M., ..., Grinev V. V. Izuchenie zakonomernostej splajsinga RNK gibridnogo onkogena RUNX1-RUNX1T1 cheloveka s pomoshyu metodov intellektualnogo analiza dannyh i vysokoproizvoditelnogo sekvenirovaniya [Study of RNA splicing patterns of the human RUNX1-RUNX1T1 fusion oncogene by the methods of data mining and high-throughput DNA sequencing]. Molekuljarnaja i prikladnaja genetika [Molecular and Applied Genetics], 2017, vol. 23, pp. 92-101 (in Russian).

6. Grinev V. V., Migas A. A., Kirsanava A. D., Mishkova O. A., Siomava N., ..., Aleinikova O. V. Decoding of exon splicing patterns in the human RUNX1-RUNX1T1 fusion gene. The International Journal of Biochemistry & Cell Biology, 2015, vol. 68, pp. 48-58.

7. Barash Y., Calarco J. A., Gao W., Pan Q., Wang X., ..., Frey B. J. Deciphering the splicing code. Nature, 2010, vol. 465, pp. 53-59.

8. Ilyushonak I. M., Saurytskaya H. A., Yatskou M. M., Skakun V. V., Grinev V. V. Rasshiryaya gipotezu "dvuh udarov": molekulyarnye mehanizmy RUNX1-RUNX1T1-oposredovannogo lejkozogeneza [Extending the "two-hits" hypothesis: the molecular mechanisms of RUNX1-RUNX1T1-mediated leukemogenesis]. Zhurnal Belorusskogo gosudarstvennogo universiteta. Biologija [Journal of the Belarusian State University. Biology], 2017, no. 2, pp. 3-16 (in Russian).

9. Zerbino D. R., Achuthan P., Akanni W., Amode M. R., Barrell D., Flicek P. Ensembl 2018. Nucleic Acids Research, 2018, vol. 46(D1), pp. D754-D761.

10. Yatskou M. M. Intellektualnyj analiz dannyh. Data Mining. Minsk, Belarusian State University, 2014, 151 p. (in Russian).

11. Bramer M. Principles of Data Mining. 2nd ed. London, Springer, 2013, 440 p.

12. Aggarwal C. C. Data Mining: The Textbook. Gewerbestrasse, Springer, 2015, 734 p.

13. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction. 2nd ed. New York, Springer, 2009, 739 p.

14. Zakirava V. R., Syrokvash D. A., Hileuski S. V., Nazarov P. V., Yatskou M. M. Razrabotka algoritmov i programmnyh sredstv klassifikacii kodiruyushih i nekodiruyushih nukleotidnyh posledovatelnostej [Development of algorithms and software for classification of nucleotide sequences]. Informatika [Informatics], 2019, vol. 16, no. 2, pp. 111-120 (in Russian).

15. Zhang S. W., Jin X. Y., Zhang T. Gene prediction in metagenomic fragments with deep learning. BioMed Research International, November 2017. DOI: 10.1155/2017/4740354

16. Al-Ajlan A., El Allali A. Feature selection for gene prediction in metagenomic fragments. BioData Mining, 2018, vol. 11. DOI: 10.1186/s13040-018-0170-z

17. Al-Ajlan A., El Allali A. CNN-MGP: Convolutional neural networks for metagenomics gene prediction. Interdisciplinary Sciences, December 2018. DOI: 10.1007/s12539-018-0313-4

18. Aivazyan S. A., Buchstaber V. M., Yenyukov I. S., Meshalkin L. Prikladnaya statistika: klassifikaciya i snizhenie razmernosti. Applied Statistics: Classification and Reduction of Dimensionality. In S. A. Aivazyan (ed.). Moscow, Finansy i statistika, 1989, 607 p. (in Russian).

19. Jolliffie I. T. Principal Component Analysis. 2nd ed. New York, Springer, 2002, 487 p.

20. Hyvaerinen A., Karhunen J., Erkki O. Independent Component Analysis. New York, John Wiley&Sons Inc., 2001, 481 p.

21. Lagutin, M. B. Naglyadnaya matematicheskaya statistika. Visual Mathematical Statistics. Moscow, BINOM, Laboratoriya znanij, 2007, 472 p. (in Russian).

22. Saeys Y., Inza I., Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, vol. 23, pp. 2507-2517.

23. Volkau A. U., Yatskou M. M., Grinev V. V. Otbor informativnyh priznakov ekzonov genov cheloveka [Selecting informative features of human gene exons]. Zhurnal Belorusskogo gosudarstvennogo universiteta. Matematika. Informatika [Journal of the Belarusian State University. Mathematics and Informatics], 2019, no. 1, pp. 77-89 (in Russian).

24. Mandel I. D. Klasternyj analiz. Cluster Analysis. Moscow, Finansy i statistika, 1988, 176 p. (in Russian).

25. Barsegyan A. A., Kupriyanov M. S., Stepanenko V. V., Holod I. I. Tehnologii analiza dannyh : Data Mining, Visual Mining, Text Mining, OLAP. Data Analysis Technologies : Data Mining, Visual Mining, Text Mining, OLAP. 2nd ed. Saint Petersburg, BHV-Peterburg, 2007, 384 p.

26. Lesk A. M. Introduction to Bioinformatics. Oxford, Oxford University Press, 2002, 283 p.

27. Van der Loo M. P. J. The stringdist package for approximate string matching. The R Journal, 2014, vol. 6, pp. 111-122.

28. Uragun B., Rajan R. The discrimination of interaural level difference sensitivity functions: development of a taxonomic data template for modeling. BMC Neuroscience, 2013, vol.14, pp. 1-19.

29. Yatskou M. Computer Simulation of Energy Relaxation and Transport in Organized Porphyrin Systems. The Netherlands, Wageningen, Ponsen & Looijen Printing Establishment, 2001, 176 p.

30. Boytsov L. Indexing methods for approximate dictionary searching: comparative analyses. ACM Journal of Experimental Algorithmics, 2011, vol. 16, pp. 1-88.

31. Navarro G. A guided tour to approximate string matching. ACM Computing Surveys, 2001, vol. 33, pp. 31-88.

32. Cohen W. A comparison of string metrics for matching names and records. KDD, 2003, vol. 3, pp. 73-78.

33. Ilyushonak I. M., Migas A. A., Sukhareuski A. Y., Schneider A. D., Grinev V. V. Vklad razlichnyh mehanizmov generacii alternativnyh transkriptov v raznoobrazie mRNK gibridnogo onkogena RUNX1-RUNX1T1 cheloveka [The contribution of various mechanisms to mRNA diversity of human fusion oncogene RUNX1-RUNX1T1]. Zhurnal Belorusskogo gosudarstvennogo universiteta. Biologija [Journal of the Belarusian State University. Biology], 2019, no. 2, pp. 45-59 (in Russian).

34. Yatskou M. M., Skakun V. V., Grinev V. V. Programmnyj paket RNAexploreR dlya predskazaniya variantov alternativnogo splajsinga v pervichnyh mRNK himernogo onkogena RUNX1/RUNX1T1 cheloveka [The software package RNAexploreR for predicting alternative splicing variants in primary mRNAs of the human chimeric oncogen RUNX1/RUNX1T1]. Informacionnye tehnologii i sistemy 2018 (ITS-2018): materialy Mezhdunarodnoj nauchnoj konferencii, Minsk, 25 oktjabrja 2018 [Information Technologies and Systems 2018 (ITS—2018): Proceedings of the International Scientific Conference, Minsk, 25 October 2018]. Minsk, Belorusskij gosudarstvennyj universitet informatiki i radioelektroniki, 2018, pp. 282-283 (in Russian).

35. R Core Team. R: A language and Environment for Statistical Computing, 2014. Available at: http://www.R-project.org/ (accessed 08.02.2019).

36. Gentleman R., Carey V. J., Bates D. M. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 2004, vol. 5, no. 10, R80.

37. RStudio: Integrated Development for R, 2015. Available at: http://www.rstudio.com/ (accessed 13.06.2019) .

38. Yatskou M. M., Skakun V. V., Grinev V. V. RNAexplorerR : Application of the Computational Pipline for Analysis and Prediction of Possible Variants of the RNA Generation Based on the Graph Model of the Organization of a Gene. Available at: https://dsa-cm.shinyapps.io/NIR_bio_code_Sh-MolBio/ (accessed 13.06.2019).


Review

For citations:


Yatskou M.M., Skakun V.V., Grinev V.V. A computational approach and software package RNAexploreR for grouping RNA molecules of human genes by exon features. Informatics. 2019;16(4):7-24. (In Russ.)

Views: 7439


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)