METHOD OF CONSTRUCTION OF GENETIC DATA CLUSTERS
Abstract
The paper presents a method of construction of genetic data clusters (functional modules) using the randomized matrices. To build the functional modules the selection and analysis of the eigenvalues of the gene profiles correlation matrix is performed. The principal components, corresponding to the eigenvalues, which are significantly different from those obtained for the randomly generated correlation matrix, are used for the analysis. Each selected principal component forms gene cluster. In a comparative experiment with the analogs the proposed method shows the advantage in allocating statistically significant different-sized clusters, the ability to filter non- informative genes and to extract the biologically interpretable functional modules matching the real data structure.
About the Authors
N. A. NovoselovaBelarus
I. E. Tom
Belarus
References
1. Liang, S. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures / S. Liang, S. Fuhrman, R. Somogyi // Pacific Symp. on Biocomputing (PSB’98). – Hawaii, 1998. – Vol. 3. – P. 18–29.
2. Cluster analysis and display of genome-wide expression patterns / M.B. Eisen [et al.] // Proceedings of the National Academy of Sciences of the United States of America. – 1998. – Vol. 95. – P. 14863–14868.
3. Analysis of gene expression data using self-organizing maps / P. Toronen [et al.] // FEBS Letters. – 1999. – Vol. 451. – P. 142–146.
4. The R Project for Statistical Computing. R Foundation for Statistical Computing [Electronic resource]. – 2009. – Mode of access : http://www.R-project.org. – Date of access : 10.09.2015.
5. Bioconductor case studies / F. Hahne [et al.]. – Springer Science & Business Media, 2010. – 296 p.
6. Cluster – Cluster analysis and visualization software [Electronic resource]. – 2015. – Mode of access : http://rana.lbl.gov/EisenSoftware.htm. – Date of access : 19.08.2015.
7. Cyber-T – microarray analysis web interface from UCI’s Institute for Genomics and Bioinformatics [Electronic resource]. – 2015. – Mode of access : http://cybert.microarray.ics.uci.edu. – Date of access : 16.09.2015.
8. SNOMAD – Standardization and normalization of microarray data [Electronic resource]. – 2015. – Mode of access : http://pevsnerlab.kennedykrieger.org/snomadinput.html. – Date of access : 12.09.2015.
9. Yeast cell cycle analysis project [Electronic resource]. – 2015. – Mode of access : http://genome-www.stanford.edu/cellcycle. – Date of access : 10.04.2015.
10. Varimax – rotation methods for factor analysis [Electronic resource]. – 2015. – Mode of access : https://stat.ethz.ch/R-manual/R-devel/library/stats/html/varimax.html. – Date of access : 17.09.2015.
11. Morey, L.C. The measurement of classification agreement: an adjustment to the rand statistic for chance agreement / L.C. Morey, A. Agresti // Educational and Psychological Measurement. – 1984. – Vol. 44. – P. 33–37.
12. Chipman, H. Hybrid hierarchical clustering with applications to microarray data / H. Chipman, R. Tibshirani // Biostatistics. – 2006. – Vol. 7, № 2. – P. 286–301.
13. YeastMine: saccharomyces genome database [Electronic resource]. – 2015. – Mode of access : http://yeastmine.yeastgenome.org/yeastmine/begin.do. – Date of access : 06.09.2015.
Review
For citations:
Novoselova N.A., Tom I.E. METHOD OF CONSTRUCTION OF GENETIC DATA CLUSTERS. Informatics. 2016;(1):64-74. (In Russ.)