The study of the reliability of the hardware part of the office cluster
https://doi.org/10.37661/1816-0301-2021-18-2-48-57
Abstract
The study of measures of reliability of the hardware part of the office cluster was carried out on the example of the cluster SKIF-GEO-Office RB (further as “cluster”) developed within the framework of scientific and technical program "SKIF-NEDRA" (2015-2018, Program of the Union State of Russia and Belarus). The cluster components are located in a small rack on the basis of full Tower "Aerocool Expredator Black" type case.
The basic architectural principles implemented in the cluster, the composition, structural and functional scheme of the cluster are given. The methodological support for calculating the reliability of the cluster, based on previous studies of the authors, and its structural scheme of reliability is justified. The choice of the main measures of reliability of the cluster core and the set of computing facilities is justified and formulas of calculation of these measures are given. The analysis of the consequences of failures of component parts of the cluster is carried out.
A mathematical model of reliability (state graph) of the set of computing facilities of cluster is proposed, which allows to derive formulas for calculating the average value of the time-to-failure and time-to-interruption of cluster. The estimation of the reliability of the cluster as a whole, based on the calculation of measures of reliability on the reference data on the reliability of components as well as on the operation of supercomputers of the family SKIF. The measures of reliability of the cluster are calculated.
About the Authors
T. S. MartinovichBelarus
Tatyana S. Martinovich - Researcher, The United Institute of Informatics Problems of the National Academy of Sciences of Belarus.
st. Surganova, 6, Minsk, 220012.
N. N. Paramonov
Belarus
Nikolaj N. Paramonov - Cand. Sci. (Eng.), Associate Professor, Leading Researcher, The United Institute of Informatics Problems of the National Academy of Sciences of Belarus.
st. Surganova, 6, Minsk, 220012.
A. G. Rymarchuk
Belarus
Aleksandr G. Rymarchuk - Chief Designer of the Project, The United Institute of Informatics Problems of the National Academy of Sciences of Belarus.
st. Surganova, 6, Minsk, 220012.
O. P. Tchij
Belarus
Oleg P. Tchij - Cand. Sci. (Phys.-Math.), Head of the Laboratory of High-Performance Systems, The United Institute of Informatics Problems of the National Academy of Sciences of Belarus.
st. Surganova, 6, Minsk, 220012.
References
1. Paramonov N. N., Tchij O. P., Rymarchuk A. G., Ablamejko S. V., Anishchenko V. V., Kruglikov S. V., Tuzikov A. V. Belorusskie superkomp'yutery semejstva SKIF. Belarusian Supercomputers of the SKIF Family, Gomel, Vechernij Gomel'-Media, 2020, 268 р. (In Russ.).
2. Kuleshova M. E., Paramonov N. N., Rymarchuk A. G., Tchij O. P. Belarusian clusters of the SKIF-GEO family. Sed'moj Nacional'nyj superkomp'juternyj forum: sbornik dokladov, Pereslavl'-Zalesskij, 27-30 nojabrja 2018 g. Institut programmnyh sistem Rossijskoj akademii nauk [7th National Supercomputer Forum: Collection of Reports, Pereslavl-Zalessky, 27 November - 30 November 2018. Program Systems Institute of the Russian Academy of Sciences] (In Russ.). Available at: http://2018.nskf.ru/TesisAll/00_Plenar/051_RymarchukAG.pdf/ (accessed 20.06.2020).
3. Kuleshova M. E., Murashko N. N., Paramonov N. N., Rymarchuk A. G., Tchij O. P. Small office cluster of the Belarusian SKIF family-GEO-Office. Shestoj Nacional'nyj superkomp'juternyj forum: sbornik dokladov, Pereslavl'-Zalesskij, 28 nojabrja - 01 dekabrja 2017 g. Institut programmnyh sistem Rossijskoj akademii nauk [6th National Supercomputer Forum: Collection of Reports, Pereslavl-Zalessky, 28 November - 01 December 2017. Program Systems Institute of the Russian Academy of Sciences] (In Russ.). Available at: http://2017.nscf.ru/nauchno-prakticheskaya-konferenciya/tezisy-dokladov/ (accessed 20.06.2020).
4. Anishchenko V. V., Kulbak L. I., Martinovich T. S. Reliability models of cluster computing systems. Vestsi Natsyianal'nai akademii navuk Belarusi. Seryia fizika-technichnykh navuk [Proceedings of the National Academy of Sciences of Belarus. Physical-technical series], 2008, no. 1, pp. 89-99 (In Russ.).
5. Viktorova V. S., Stepenyanc A. S. Modeli i metody rascheta nadezhnosti tekhnicheskih sistem. Models and Methods for Calculating the Reliability of Technical Systems. Moscow, Lenand, 2016, 256 р. (In Russ.).
6. Rymarchuk A. G., Evdokimchikov A. N., Mazjuk V. V, Kruglikov S. V., Paramonov N. N., Pechkovskij E. I. Kompaktnyj vychislitel'nyj klaster: patent Respubliki Belarus' na poleznuju model' № 12417, MPK 606F. Compact Computing Cluster: patent of the Republic of Belarus for Utility Model no. 12417, MPK 606F. Publ. date 30.10.2020 (In Russ.).
Review
For citations:
Martinovich T.S., Paramonov N.N., Rymarchuk A.G., Tchij O.P. The study of the reliability of the hardware part of the office cluster. Informatics. 2021;18(2):48-57. (In Russ.) https://doi.org/10.37661/1816-0301-2021-18-2-48-57