TY - GEN
T1 - Scaling and Selecting GPU Methods for All Pairs Shortest Paths (APSP) Computations
AU - Xia, Yang
AU - Jiang, Peng
AU - Agrawal, Gagan
AU - Ramnath, Rajiv
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - All Pairs Shortest Path (APSP) is one of the graph problems where the output size is significantly larger than the input size. This paper examines the issues in scaling GPU implementations for this problem beyond the memory limits. Because the existing (in-core) methods offer a complex trade-off between the overall computation complexity and the available parallelism, choosing the best out-of-core version for a given matrix is challenging. We develop three efficient out-of-core implementations, which are based on the blocked Floyd-Warshall algorithm, Johnson's algorithm, and the boundary algorithm, respectively. Next, we develop a methodology to select the best implementation for a given graph. Experimental results show that compared with an efficient multi-core APSP implementation, the out-of-core version achieves speedups of 8.22 to 12.40 for graphs with a small separator, and speedups of 2.23 to 2.79 for other sparse graphs, and our models can select the best implementation in most cases.
AB - All Pairs Shortest Path (APSP) is one of the graph problems where the output size is significantly larger than the input size. This paper examines the issues in scaling GPU implementations for this problem beyond the memory limits. Because the existing (in-core) methods offer a complex trade-off between the overall computation complexity and the available parallelism, choosing the best out-of-core version for a given matrix is challenging. We develop three efficient out-of-core implementations, which are based on the blocked Floyd-Warshall algorithm, Johnson's algorithm, and the boundary algorithm, respectively. Next, we develop a methodology to select the best implementation for a given graph. Experimental results show that compared with an efficient multi-core APSP implementation, the out-of-core version achieves speedups of 8.22 to 12.40 for graphs with a small separator, and speedups of 2.23 to 2.79 for other sparse graphs, and our models can select the best implementation in most cases.
UR - http://www.scopus.com/inward/record.url?scp=85136334531&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136334531&partnerID=8YFLogxK
U2 - 10.1109/IPDPS53621.2022.00027
DO - 10.1109/IPDPS53621.2022.00027
M3 - Conference contribution
AN - SCOPUS:85136334531
T3 - Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
SP - 190
EP - 200
BT - Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 36th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022
Y2 - 30 May 2022 through 3 June 2022
ER -