References

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

10.37661/1816-0301-2023-20-2-65-84

inform-1245

Research Article

ЛОГИЧЕСКОЕ ПРОЕКТИРОВАНИЕ

LOGICAL DESIGN

Генерация потоковых сетей акторов поиска кратчайших путей для параллельной многоядерной реализации

Generation of shortest path search dataflow networks of actors for parallel multi-core implementation

Прихожий

А. А.

Prihozhy

A. A.

Анатолий Алексеевич Прихожий, доктор технических наук, профессор

пр. Независимости, 65, Минск, 220013

Anatoly A. Prihozhy, D. Sc. (Eng.), Professor

av. Nezavisimosty, 65, Minsk, 220013

prihozhy@yahoo.com

Белорусский национальный технический университетBelarusian National Technical University

2023

29062023

2026584

2023

Прихожий А.А.

Prihozhy A.A.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/1245

Цели. Рассматривается задача распараллеливания вычислений на многоядерных системах. Посредством блочного алгоритма Флойда – Уоршалла поиска кратчайших путей на плотных графах большого размера сравниваются два вида параллелизма: разветвление/слияние и сетевой потоковый. С использованием языка программирования CAL разрабатываются метод построения акторов потока данных и алгоритм генерации параллельных сетей акторов. Целью работы является повышение производительности параллельных сетевых реализаций алгоритмов, обладающих свойством частичного порядка вычислений, на многоядерных процессорах.Методы. Используются методы теории графов, теории алгоритмов, теории распараллеливания, теории формальных языков.Результаты. Доказаны утверждения о возможности переупорядочивания вычислений в блочном алгоритме Флойда – Уоршалла, способствующие повышению загрузки ядер при реализации алгоритма. На основе утверждений разработан метод построения акторов на языке CAL и предложен алгоритм автоматической генерации CAL-сетей потока данных для различных конфигураций матриц блоков, описывающих длины кратчайших путей. Доказано, что сети обладают свойствами согласованности, ограниченности и живучести. В акторах, работающих параллельно, порядок выполнения действий с асинхронным поведением может динамически меняться, что приводит к эффективному использованию кэшей и увеличению загрузки ядер. Для реализации новых возможностей акторов, сетей и метода их генерации разработан настраиваемый многопоточный CAL-движок, реализующий статическую модель потоковых вычислений с ограниченными размерами буферов. Из экспериментальных результатов, полученных на четырех типах многоядерных процессоров, следует, что существует оптимальный размер сетевой матрицы акторов, для которого производительность максимальна, и этот размер зависит от размера графа и количества ядер.Заключение. Показано, что сети акторов потока данных являются эффективным средством распарал-леливания алгоритмов с высокой вычислительной нагрузкой, описывающих частичный порядок вычислений над данными, декомпозированными на части. Результаты, полученные на блочном алгоритме поиска кратчайших путей, показали, что параллелизм сетей потока данных дает более высокую производительность программных реализаций на многоядерных процессорах по сравнению с параллелизмом разветвления/слияния стандарта OpenMP.

Objectives. The problem of parallelizing computations on multicore systems is considered. On the Floyd – Warshall blocked algorithm of shortest paths search in dense graphs of large size, two types of parallelism are compared: fork-join and network dataflow. Using the CAL programming language, a method of developing actors and an algorithm of generating parallel dataflow networks are proposed. The objective is to improve performance of parallel implementations of algorithms which have the property of partial order of computations on multicore processors.Methods. Methods of graph theory, algorithm theory, parallelization theory and formal language theory are used.Results. Claims about the possibility of reordering calculations in the blocked Floyd – Warshall algorithm are proved, which make it possible to achieve a greater load of cores during algorithm execution. Based on the claims, a method of constructing actors in the CAL language is developed and an algorithm for automatic generation of dataflow CAL networks for various configurations of block matrices describing the lengths of the shortest paths is proposed. It is proved that the networks have the properties of rate consistency, boundedness, and liveness. In actors running in parallel, the order of execution of actions with asynchronous behavior can change dynamically, resulting in efficient use of caches and increased core load. To implement the new features of actors, networks and the method of their generation, a tunable multi-threaded CAL engine has been developed that implements a static dataflow model of computation with bounded sizes of buffers. From the experimental results obtained on four types of multi-core processors it follows that there is an optimal size of the network matrix of actors for which the performance is maximum, and the size depends on the number of cores and the size of graph.Conclusion. It has been shown that dataflow networks of actors are an effective means to parallelize computationally intensive algorithms that describe a partial order of computations over decomposed data. The results obtained on the blocked algorithm of shortest paths search prove that the parallelism of dataflow networks gives higher performance of software implementations on multicore processors in comparison with the fork-join parallelism of OpenMP.

поток данныхсеть акторовязык CALкратчайшие путиблочный алгоритммногоядерная системаускорение

dataflownetwork of actorsCAL languageshortest pathsblocked algorithmmulti-core systemspeedup

References1

Floyd R. W. Algorithm 97: Shortest path. Communications of the ACM, 1962, vol. 5, no. 6, p. 345.

Floyd, R. W. Algorithm 97: Shortest path / R. W. Floyd // Communications of the ACM. – 1962. – Vol. 5, no. 6. – P. 345.

Madkour A, Aref W. G., Rehman F. U., Rahman M. A., Basalamah S. A. Survey of Shortest-Path Algorithms, 2017, 26 р. Available at: https://arxiv.org/abs/1705.02044 (accessed 23.11.2022).

Survey of Shortest-Path Algorithms / A. Madkour [et al.]. – 2017. – 26 p. – Mode of access: https://arxiv.org/abs/1705.02044. – Date of access: 23.11.2022.

Anu P., Kumar M. G. Finding all-pairs shortest path for a large-scale transportation network using parallel Floyd-Warshall and parallel Dijkstra algorithms. Journal of Computing in Civil Engineering, 2013, vol. 27, no. 3, pp. 263–273.

Anu, P. Finding all-pairs shortest path for a large-scale transportation network using parallel Floyd-Warshall and parallel Dijkstra algorithms / P. Anu, M. G. Kumar // J. of Computing in Civil Engineering. – 2013. – Vol. 27, no. 3. – P. 263–273.

Prihozhy A. A., Mattavelli M., Mlynek D. Evaluation of parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path. IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 5, 2005, pp. 593–608.

Prihozhy, A. A. Evaluation of parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path / A. A. Prihozhy, M. Mattavelli, D. Mlynek // IEEE Transactions on Circuits and Systems for Video Technology. – 2005. – Vol. 15, no. 5. – P. 593–608.

Singh A., Mishra P. K. Performance analysis of Floyd Warshall algorithm vs rectangular algorithm. International Journal of Computer Applications, 2014, vol. 107, no. 16, pp. 23–27.

Singh, A. Performance analysis of Floyd Warshall algorithm vs rectangular algorithm / A. Singh, P. K. Mishra // Intern. J. of Computer Applications. – 2014. – Vol. 107, no. 16. – P. 23–27.

Venkataraman G. A., Sahni S., Mukhopadhyaya S. Blocked all-pairs shortest paths algorithm. Journal of Experimental Algorithmics (JEA), 2003, vol. 8, pp. 857–874.

Venkataraman, G. A. Blocked all-pairs shortest paths algorithm / G. A. Venkataraman, S. Sahni, S. Mukhopadhyaya // J. of Experimental Algorithmics (JEA). – 2003. – Vol. 8. – P. 857–874.

Park J., Penner M., Prasanna V. K. Optimizing graph algorithms for improved cache performance. IEEE Transactions on Parallel and Distributed Systems, 2004, vol. 15, no. 9, pp. 769–782.

Park, J. Optimizing graph algorithms for improved cache performance / J. Park, M. Penner, V. K. Prasanna // IEEE Transactions on Parallel and Distributed Systems. – 2004. – Vol. 15, no. 9. – P. 769–782.

Madduri K., Bader D. A., Berry J. W., Crobak J. R. An experimental study of a parallel shortest path algorithm for solving large-scale graph instances. Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX 2007, New Orleans, Louisiana, USA, 6 January 2007. New Orleans, 2007, pp. 23–35.

An experimental study of a parallel shortest path algorithm for solving large-scale graph instances / K. Madduri [et al.] // Proc. of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX 2007, New Orleans, Louisiana, USA, 6 Jan. 2007. – New Orleans, 2007. – P. 23–35.

Albalwi E., Thulasiraman P., Thulasiram R. Task level parallelization of all pair shortest path algorithm in OpenMP 3.0. Advances in Computer Science and Engineering (CSE 2013). Los Angeles, Atlantis Press, 2013, pp. 109–112.

Albalwi, E. Task level parallelization of all pair shortest path algorithm in OpenMP 3.0 / E. Albalwi, P. Thulasiraman, R. Thulasiram // Advances in Computer Science and Engineering (CSE 2013). – Los Angeles : Atlantis Press, 2013. – P. 109–112.

Tang P. Rapid development of parallel blocked all-pairs shortest paths code for multi-core computers. IEEE SOUTHEASTCON 2014, Lexington, KY, USA, 13–16 March 2014. Lexington, 2014, pp. 1–7.

Tang, P. Rapid development of parallel blocked all-pairs shortest paths code for multi-core computers / P. Tang // IEEE SOUTHEASTCON 2014, Lexington, KY, USA, 13–16 Mar. 2014. – Lexington, 2014. – P. 1–7.

Prihozhy A. A., Karasik O. N. Heterogeneous blocked all-pairs shortest paths algorithm. Sistemnyj analiz i prikladnaja informatika [System Analysis and Applied Information Science], 2017, no. 3, pp. 68–75 (In Russ.). https://doi.org/10.21122/2309-4923-2017-3-68-75

Прихожий, А. А. Разнородный блочный алгоритм поиска кратчайших путей между всеми парами вершин графа / А. А. Прихожий, О. Н. Карасик // Системный анализ и прикладная информатика. – 2017. – № 3. – С. 68–75. https://doi.org/10.21122/2309-4923-2017-3-68-75

Karasik O. N., Prihozhy A. A. Threaded block-parallel algorithm for finding the shortest paths on graph. Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki [Reports of the Belarusian State University of Informatics and Radioelectronics], 2018, no. 2, pp. 77–84 (In Russ.).

Карасик, О. Н. Потоковый блочно-параллельный алгоритм поиска кратчайших путей на графе / О. Н. Карасик, А. А. Прихожий // Доклады БГУИР. – 2018. – № 2. – С. 77–84.

Karasik O. N., Prihozhy A. A. Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation. System Analysis and Applied Information Science, 2022, no. 3, pp. 68–75. https://doi.org/10.21122/2309-4923-2022-3-57-65

Karasik, O. N. Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation / O. N. Karasik, A. A. Prihozhy // System Analysis and Applied Information Science. – 2022. – No. 3. – P. 68–75. https://doi.org/10.21122/2309-4923-2022-3-57-65

Prihozhy A. A. Simulation of direct mapped, k-way and fully associative cache on all pairs shortest paths algorithms. System Analysis and Applied Information Science, 2019, no. 4, pp. 10–18. https://doi.org/10.21122/2309-4923-2019-4-10-18

Prihozhy, A. A. Simulation of direct mapped, k-way and fully associative cache on all pairs shortest paths algorithms / A. A. Prihozhy // System Analysis and Applied Information Science. – 2019. – No. 4. – P. 10–18. https://doi.org/10.21122/2309-4923-2019-4-10-18

Prihozhy A. A. Optimization of data allocation in hierarchical memory for blocked shortest paths algorithms. System Analysis and Applied Information Science, 2021, no. 3, pp. 40–50. https://doi.org/10.21122/2309-4923-2021-3-40-50

Prihozhy, A. A. Optimization of data allocation in hierarchical memory for blocked shortest paths algorithms / A. A. Prihozhy // System Analysis and Applied Information Science. – 2021. – No. 3. – P. 40–50. https://doi.org/10.21122/2309-4923-2021-3-40-50

Likhoded N. A., Sipeyko D. S. Generalized blocked Floyd – Warshall algorithm. Journal of the Belarusian State University. Mathematics and Informatics, 2019, no. 3, pp. 84– 92 (In Russ).

Лиходед, Н. А. Обобщенный блочный алгоритм Флойда – Уоршелла / Н. А. Лиходед, Д. С. Сипейко // Журнал Бел. гос. ун-та. Математика. Информатика. – 2019. – № 3. – С. 84–92.

Kahn G. The semantics of a simple language for parallel programming. Information Processing 74: Proceedings of the IFIP Congress 74, Stockholm, Sweden, 5–10 August 1974. Stockholm, 1974, pp. 471–475.

Kahn, G. The semantics of a simple language for parallel programming / G. Kahn // Information Processing 74: Proc. of the IFIP Congress 74, Stockholm, Sweden, 5–10 Aug. 1974. – Stockholm, 1974. – P. 471–475.

Lee E. A., Messerschmitt D. G. Synchronous dataflow. Proceedings of the IEEE, September 1987, vol. 75, no. 9, pp. 1235–1245.

Lee, E. A. Synchronous dataflow / E. A. Lee, D. G. Messerschmitt // Proc. of the IEEE. – Sept. 1987. – Vol. 75, no. 9. – P. 1235–1245.

Prihozhy A., Mlynek D., Solomennik M., Mattavelli M. Techniques for optimization of net algorithms. 2002 International Conference on Parallel Computing in Electrical Engineering (PARELEC 2002), Warsaw, Poland, 22–25 September 2002. Warsaw, 2002, pp. 211–216.

Techniques for optimization of net algorithms / A. Prihozhy [et al.] // 2002 Intern. Conf. on Parallel Computing in Electrical Engineering (PARELEC 2002), Warsaw, Poland, 22–25 Sept. 2002. – Warsaw, 2002. – P. 211–216.

Eker J., Janneck J. W. Cal Language Report : Technical Report UCB/ERL M03/48. University of California at Berkeley, December 2003, 107 p.

Eker, J. Cal Language Report : Technical Report UCB/ERL M03/48 / J. Eker, J. W. Janneck. – University of California at Berkeley, Dec. 2003. – 107 p.

Bhattacharyya S. S., Brebner G., Janneck J. W., Eker J., Platen C., …, Raulet M. OpenDF – a dataflow toolset for reconfigurable hardware and multicore systems. First Swedish Workshop on Multi-Core Computing, MCC, Ronneby, Sweden, 27–28 November 2008. Ronneby, 2008, рр. 43–49.

OpenDF – a dataflow toolset for reconfigurable hardware and multicore systems / S. S. Bhattacharyya [et al.] // First Swedish Workshop on Multi-Core Computing, MCC, Ronneby, Sweden, 27–28 Nov. 2008. – Ronneby, 2008. – P. 43–49.

Murthy P. K., Lee E. A. Multidimensional synchronous dataflow. IEEE Transactions on Signal Processing, 2002, vol. 50, no. 8, pp. 2064–2079.

Murthy, P. K. Multidimensional synchronous dataflow / P. K. Murthy, E. A. Lee // IEEE Transactions on Signal Processing. – 2002. – Vol. 50, no. 8. – P. 2064–2079.

Bhattacharya B., Bhattacharyya S. S. Parameterized dataflow modeling for DSP systems. IEEE Transactions on Signal Processing, 2001, vol. 49, no. 10, pp. 2408–2421.

Bhattacharya, B. Parameterized dataflow modeling for DSP systems / B. Bhattacharya, S. S. Bhattacharyya // IEEE Transactions on Signal Processing. – 2001. – Vol. 49, no. 10. – P. 2408–2421.

Bebelis V., Fradet P., Girault A., Lavigueur B. BPDF: Boolean Parametric Data Flow : Research Report RR-8333. INRIA, 2013, 21 p.

BPDF: Boolean Parametric Data Flow: Research Report RR-8333/ V. Bebelis [et al.]. – INRIA, 2013. – 21 p.

Rahman A.-H. Ab, Prihozhy A., Mattavelli M. Pipeline synthesis and optimization of FPGA-based video processing applications with CAL. EURASIP Journal on Image and Video Processing, vol. 2011:19, pp. 1–28. https://doi.org/10.1186/16875281-2011-19

Rahman, A.-H. Ab. Pipeline synthesis and optimization of FPGA-based video processing applications with CAL / A.-H. Ab Rahman, A. Prihozhy, M. Mattavelli // EURASIP J. on Image and Video Processing. – 2011. – Vol. 2011:19. – P. 1–28. https://doi.org/10.1186/16875281-2011-19

Prihozhy A., Casale-Brunet S., Bezati E., Mattavelli M. Efficient dynamic optimization heuristics for dataflow pipelines. 2018 IEEE International Workshop on Signal Processing Systems, SiPS 2018, Cape Town, South Africa, 21–24 October 2018. Cape Town, 2018, pp. 337–342.

Efficient dynamic optimization heuristics for dataflow pipelines / A. Prihozhy [et al.] // 2018 IEEE Intern. Workshop on Signal Processing Systems, SiPS 2018, Cape Town, South Africa, 21–24 Oct. 2018. – Cape Town, 2018. – P. 337–342.

Prihozhy A. A., Casale-Brunet S., Bezati E., Mattavelli M. Pipeline synthesis and optimization from branched feedback dataflow programs. Journal of Signal Processing Systems, Springer Nature, 2020, vol. 92, pp. 1091–1099. https://doi.org/10.1007/s11265-020-01568-5

Pipeline synthesis and optimization from branched feedback dataflow programs / A. A. Prihozhy [et al.] // J. of Signal Processing Systems, Springer Nature. – 2020. – Vol. 92. – P. 1091–1099. https://doi.org/10.1007/s11265-020-01568-5

The authors declare that there are no conflicts of interest present.