References

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

inform-125

Research Article

БИОИНФОРМАТИКА

BIOINFORMATICS

МЕТОД ПОСТРОЕНИЯ КЛАСТЕРОВ ГЕНЕТИЧЕСКИХ ДАННЫХ

METHOD OF CONSTRUCTION OF GENETIC DATA CLUSTERS

Новоселова

Н. А.

Novoselova

N. A.

novosel@newman.bas-net.by

Том

И. Э.

Tom

I. E.

Объединенный институт проблем информатики НАН БеларусиBelarus

2016

03102016

016474

2016

Новоселова Н.А., Том И.Э.

Novoselova N.A., Tom I.E.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/125

Предлагается метод построения кластеров (функциональных модулей) генетических данных, основанный на использовании рандомизированных матриц. Для построения кластеров выполняется выделение и анализ главных компонент матрицы корреляций генных профилей. В качестве конечных выбираются главные компоненты, которые соответствуют собственным значениям, значимо отличающимся от полученных при анализе случайным образом сгенерированной корреляционной матрицы (рандомизированной). В сравнительном вычислительном эксперименте с аналогами метод показал свое преимущество в возможности выделять статистически значимые кластеры малых и больших размеров, способности отфильтровывать неинформативные признаки, а также получать биологически интерпретируемые функциональные модули, адекватные реальной структуре данных.

The paper presents a method of construction of genetic data clusters (functional modules) using the randomized matrices. To build the functional modules the selection and analysis of the eigenvalues of the gene profiles correlation matrix is performed. The principal components, corresponding to the eigenvalues, which are significantly different from those obtained for the randomly generated correlation matrix, are used for the analysis. Each selected principal component forms gene cluster. In a comparative experiment with the analogs the proposed method shows the advantage in allocating statistically significant different-sized clusters, the ability to filter non- informative genes and to extract the biologically interpretable functional modules matching the real data structure.

References1

Liang, S. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures / S. Liang, S. Fuhrman, R. Somogyi // Pacific Symp. on Biocomputing (PSB’98). – Hawaii, 1998. – Vol. 3. – P. 18–29.

Cluster analysis and display of genome-wide expression patterns / M.B. Eisen [et al.] // Proceedings of the National Academy of Sciences of the United States of America. – 1998. – Vol. 95. – P. 14863–14868.

Analysis of gene expression data using self-organizing maps / P. Toronen [et al.] // FEBS Letters. – 1999. – Vol. 451. – P. 142–146.

The R Project for Statistical Computing. R Foundation for Statistical Computing [Electronic resource]. – 2009. – Mode of access : http://www.R-project.org. – Date of access : 10.09.2015.

Bioconductor case studies / F. Hahne [et al.]. – Springer Science & Business Media, 2010. – 296 p.

Cluster – Cluster analysis and visualization software [Electronic resource]. – 2015. – Mode of access : http://rana.lbl.gov/EisenSoftware.htm. – Date of access : 19.08.2015.

Cyber-T – microarray analysis web interface from UCI’s Institute for Genomics and Bioinformatics [Electronic resource]. – 2015. – Mode of access : http://cybert.microarray.ics.uci.edu. – Date of access : 16.09.2015.

SNOMAD – Standardization and normalization of microarray data [Electronic resource]. – 2015. – Mode of access : http://pevsnerlab.kennedykrieger.org/snomadinput.html. – Date of access : 12.09.2015.

Yeast cell cycle analysis project [Electronic resource]. – 2015. – Mode of access : http://genome-www.stanford.edu/cellcycle. – Date of access : 10.04.2015.

Varimax – rotation methods for factor analysis [Electronic resource]. – 2015. – Mode of access : https://stat.ethz.ch/R-manual/R-devel/library/stats/html/varimax.html. – Date of access : 17.09.2015.

Morey, L.C. The measurement of classification agreement: an adjustment to the rand statistic for chance agreement / L.C. Morey, A. Agresti // Educational and Psychological Measurement. – 1984. – Vol. 44. – P. 33–37.

Chipman, H. Hybrid hierarchical clustering with applications to microarray data / H. Chipman, R. Tibshirani // Biostatistics. – 2006. – Vol. 7, № 2. – P. 286–301.

YeastMine: saccharomyces genome database [Electronic resource]. – 2015. – Mode of access : http://yeastmine.yeastgenome.org/yeastmine/begin.do. – Date of access : 06.09.2015.

The authors declare that there are no conflicts of interest present.