TY - GEN
T1 - Smart streaming
T2 - 34th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
AU - Guo, Jia
AU - Agrawal, Gagan
N1 - Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/5
Y1 - 2020/5
N2 - In recent years, there has been considerable interest in developing frameworks for processing streaming data. Like the precursor commercial systems for data-intensive processing, these systems have largely not used methods popular within the HPC community (for example, MPI for communication). In this paper, we demonstrate a system for stream processing that offers a high-level API to the users (similar to MapReduce), is fault-tolerant, and is also more efficient and scalable than current solutions. Particularly, a cost-efficient MPI/OpenMP based fault-tolerant scheme is incorporated so that the system can survive node failures with only a modest degradation of performance. We evaluate both the functionality and efficiency of Smart Streaming using four common applications in machine learning and data analytics. A comparison against state-of-the-art streaming frameworks shows our system boosts the throughput of test cases by up to 10X and achieve desirable parallelism when scaled out. Additionally, the performance loss upon failures is only proportional to the share of failed resources.
AB - In recent years, there has been considerable interest in developing frameworks for processing streaming data. Like the precursor commercial systems for data-intensive processing, these systems have largely not used methods popular within the HPC community (for example, MPI for communication). In this paper, we demonstrate a system for stream processing that offers a high-level API to the users (similar to MapReduce), is fault-tolerant, and is also more efficient and scalable than current solutions. Particularly, a cost-efficient MPI/OpenMP based fault-tolerant scheme is incorporated so that the system can survive node failures with only a modest degradation of performance. We evaluate both the functionality and efficiency of Smart Streaming using four common applications in machine learning and data analytics. A comparison against state-of-the-art streaming frameworks shows our system boosts the throughput of test cases by up to 10X and achieve desirable parallelism when scaled out. Additionally, the performance loss upon failures is only proportional to the share of failed resources.
UR - http://www.scopus.com/inward/record.url?scp=85091602248&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091602248&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW50202.2020.00075
DO - 10.1109/IPDPSW50202.2020.00075
M3 - Conference contribution
AN - SCOPUS:85091602248
T3 - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
SP - 396
EP - 405
BT - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 May 2020 through 22 May 2020
ER -