Generation of shortest path search dataflow networks of actors for parallel multi-core implementation
https://doi.org/10.37661/1816-0301-2023-20-2-65-84
Abstract
Objectives. The problem of parallelizing computations on multicore systems is considered. On the Floyd – Warshall blocked algorithm of shortest paths search in dense graphs of large size, two types of parallelism are compared: fork-join and network dataflow. Using the CAL programming language, a method of developing actors and an algorithm of generating parallel dataflow networks are proposed. The objective is to improve performance of parallel implementations of algorithms which have the property of partial order of computations on multicore processors.
Methods. Methods of graph theory, algorithm theory, parallelization theory and formal language theory are used.
Results. Claims about the possibility of reordering calculations in the blocked Floyd – Warshall algorithm are proved, which make it possible to achieve a greater load of cores during algorithm execution. Based on the claims, a method of constructing actors in the CAL language is developed and an algorithm for automatic generation of dataflow CAL networks for various configurations of block matrices describing the lengths of the shortest paths is proposed. It is proved that the networks have the properties of rate consistency, boundedness, and liveness. In actors running in parallel, the order of execution of actions with asynchronous behavior can change dynamically, resulting in efficient use of caches and increased core load. To implement the new features of actors, networks and the method of their generation, a tunable multi-threaded CAL engine has been developed that implements a static dataflow model of computation with bounded sizes of buffers. From the experimental results obtained on four types of multi-core processors it follows that there is an optimal size of the network matrix of actors for which the performance is maximum, and the size depends on the number of cores and the size of graph.
Conclusion. It has been shown that dataflow networks of actors are an effective means to parallelize computationally intensive algorithms that describe a partial order of computations over decomposed data. The results obtained on the blocked algorithm of shortest paths search prove that the parallelism of dataflow networks gives higher performance of software implementations on multicore processors in comparison with the fork-join parallelism of OpenMP.
About the Author
A. A. PrihozhyBelarus
Anatoly A. Prihozhy, D. Sc. (Eng.), Professor
av. Nezavisimosty, 65, Minsk, 220013
References
1. Floyd, R. W. Algorithm 97: Shortest path / R. W. Floyd // Communications of the ACM. – 1962. – Vol. 5, no. 6. – P. 345.
2. Survey of Shortest-Path Algorithms / A. Madkour [et al.]. – 2017. – 26 p. – Mode of access: https://arxiv.org/abs/1705.02044. – Date of access: 23.11.2022.
3. Anu, P. Finding all-pairs shortest path for a large-scale transportation network using parallel Floyd-Warshall and parallel Dijkstra algorithms / P. Anu, M. G. Kumar // J. of Computing in Civil Engineering. – 2013. – Vol. 27, no. 3. – P. 263–273.
4. Prihozhy, A. A. Evaluation of parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path / A. A. Prihozhy, M. Mattavelli, D. Mlynek // IEEE Transactions on Circuits and Systems for Video Technology. – 2005. – Vol. 15, no. 5. – P. 593–608.
5. Singh, A. Performance analysis of Floyd Warshall algorithm vs rectangular algorithm / A. Singh, P. K. Mishra // Intern. J. of Computer Applications. – 2014. – Vol. 107, no. 16. – P. 23–27.
6. Venkataraman, G. A. Blocked all-pairs shortest paths algorithm / G. A. Venkataraman, S. Sahni, S. Mukhopadhyaya // J. of Experimental Algorithmics (JEA). – 2003. – Vol. 8. – P. 857–874.
7. Park, J. Optimizing graph algorithms for improved cache performance / J. Park, M. Penner, V. K. Prasanna // IEEE Transactions on Parallel and Distributed Systems. – 2004. – Vol. 15, no. 9. – P. 769–782.
8. An experimental study of a parallel shortest path algorithm for solving large-scale graph instances / K. Madduri [et al.] // Proc. of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX 2007, New Orleans, Louisiana, USA, 6 Jan. 2007. – New Orleans, 2007. – P. 23–35.
9. Albalwi, E. Task level parallelization of all pair shortest path algorithm in OpenMP 3.0 / E. Albalwi, P. Thulasiraman, R. Thulasiram // Advances in Computer Science and Engineering (CSE 2013). – Los Angeles : Atlantis Press, 2013. – P. 109–112.
10. Tang, P. Rapid development of parallel blocked all-pairs shortest paths code for multi-core computers / P. Tang // IEEE SOUTHEASTCON 2014, Lexington, KY, USA, 13–16 Mar. 2014. – Lexington, 2014. – P. 1–7.
11. Прихожий, А. А. Разнородный блочный алгоритм поиска кратчайших путей между всеми парами вершин графа / А. А. Прихожий, О. Н. Карасик // Системный анализ и прикладная информатика. – 2017. – № 3. – С. 68–75. https://doi.org/10.21122/2309-4923-2017-3-68-75
12. Карасик, О. Н. Потоковый блочно-параллельный алгоритм поиска кратчайших путей на графе / О. Н. Карасик, А. А. Прихожий // Доклады БГУИР. – 2018. – № 2. – С. 77–84.
13. Karasik, O. N. Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation / O. N. Karasik, A. A. Prihozhy // System Analysis and Applied Information Science. – 2022. – No. 3. – P. 68–75. https://doi.org/10.21122/2309-4923-2022-3-57-65
14. Prihozhy, A. A. Simulation of direct mapped, k-way and fully associative cache on all pairs shortest paths algorithms / A. A. Prihozhy // System Analysis and Applied Information Science. – 2019. – No. 4. – P. 10–18. https://doi.org/10.21122/2309-4923-2019-4-10-18
15. Prihozhy, A. A. Optimization of data allocation in hierarchical memory for blocked shortest paths algorithms / A. A. Prihozhy // System Analysis and Applied Information Science. – 2021. – No. 3. – P. 40–50. https://doi.org/10.21122/2309-4923-2021-3-40-50
16. Лиходед, Н. А. Обобщенный блочный алгоритм Флойда – Уоршелла / Н. А. Лиходед, Д. С. Сипейко // Журнал Бел. гос. ун-та. Математика. Информатика. – 2019. – № 3. – С. 84–92.
17. Kahn, G. The semantics of a simple language for parallel programming / G. Kahn // Information Processing 74: Proc. of the IFIP Congress 74, Stockholm, Sweden, 5–10 Aug. 1974. – Stockholm, 1974. – P. 471–475.
18. Lee, E. A. Synchronous dataflow / E. A. Lee, D. G. Messerschmitt // Proc. of the IEEE. – Sept. 1987. – Vol. 75, no. 9. – P. 1235–1245.
19. Techniques for optimization of net algorithms / A. Prihozhy [et al.] // 2002 Intern. Conf. on Parallel Computing in Electrical Engineering (PARELEC 2002), Warsaw, Poland, 22–25 Sept. 2002. – Warsaw, 2002. – P. 211–216.
20. Eker, J. Cal Language Report : Technical Report UCB/ERL M03/48 / J. Eker, J. W. Janneck. – University of California at Berkeley, Dec. 2003. – 107 p.
21. OpenDF – a dataflow toolset for reconfigurable hardware and multicore systems / S. S. Bhattacharyya [et al.] // First Swedish Workshop on Multi-Core Computing, MCC, Ronneby, Sweden, 27–28 Nov. 2008. – Ronneby, 2008. – P. 43–49.
22. Murthy, P. K. Multidimensional synchronous dataflow / P. K. Murthy, E. A. Lee // IEEE Transactions on Signal Processing. – 2002. – Vol. 50, no. 8. – P. 2064–2079.
23. Bhattacharya, B. Parameterized dataflow modeling for DSP systems / B. Bhattacharya, S. S. Bhattacharyya // IEEE Transactions on Signal Processing. – 2001. – Vol. 49, no. 10. – P. 2408–2421.
24. BPDF: Boolean Parametric Data Flow: Research Report RR-8333/ V. Bebelis [et al.]. – INRIA, 2013. – 21 p.
25. Rahman, A.-H. Ab. Pipeline synthesis and optimization of FPGA-based video processing applications with CAL / A.-H. Ab Rahman, A. Prihozhy, M. Mattavelli // EURASIP J. on Image and Video Processing. – 2011. – Vol. 2011:19. – P. 1–28. https://doi.org/10.1186/16875281-2011-19
26. Efficient dynamic optimization heuristics for dataflow pipelines / A. Prihozhy [et al.] // 2018 IEEE Intern. Workshop on Signal Processing Systems, SiPS 2018, Cape Town, South Africa, 21–24 Oct. 2018. – Cape Town, 2018. – P. 337–342.
27. Pipeline synthesis and optimization from branched feedback dataflow programs / A. A. Prihozhy [et al.] // J. of Signal Processing Systems, Springer Nature. – 2020. – Vol. 92. – P. 1091–1099. https://doi.org/10.1007/s11265-020-01568-5
Review
For citations:
Prihozhy A.A. Generation of shortest path search dataflow networks of actors for parallel multi-core implementation. Informatics. 2023;20(2):65-84. https://doi.org/10.37661/1816-0301-2023-20-2-65-84