TY - GEN
T1 - A framework for elastic execution of existing MPI programs
AU - Raveendran, Aarthi
AU - Bicer, Tekin
AU - Agrawal, Gagan
PY - 2011
Y1 - 2011
N2 - There is a clear trend towards using cloud resources in the scientific or the HPC community, with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and increase or decrease the number of instances an application is executed on during the execution of the application, to meet time and/or cost constraints. Unfortunately, HPC applications have almost always been designed to use a fixed number of resources. This paper describes our initial work towards the goal of making existing MPI applications elastic for a cloud framework. Considering the limitations of the MPI implementations currently available, we support adaptation by terminating one execution and restarting a new program on a different number of instances. The components of our envisioned system include a decision layer which considers time and cost constraints, a framework for modifying MPI programs, and a cloud-based runtime support that can enable redistributing of saved data, and support automated resource allocation and application restart on a different number of nodes. Using two MPI applications, we demonstrate the feasibility of our approach, and show that outputting, redistributing, and reading back data can be a reasonable approach for making existing MPI applications elastic.
AB - There is a clear trend towards using cloud resources in the scientific or the HPC community, with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and increase or decrease the number of instances an application is executed on during the execution of the application, to meet time and/or cost constraints. Unfortunately, HPC applications have almost always been designed to use a fixed number of resources. This paper describes our initial work towards the goal of making existing MPI applications elastic for a cloud framework. Considering the limitations of the MPI implementations currently available, we support adaptation by terminating one execution and restarting a new program on a different number of instances. The components of our envisioned system include a decision layer which considers time and cost constraints, a framework for modifying MPI programs, and a cloud-based runtime support that can enable redistributing of saved data, and support automated resource allocation and application restart on a different number of nodes. Using two MPI applications, we demonstrate the feasibility of our approach, and show that outputting, redistributing, and reading back data can be a reasonable approach for making existing MPI applications elastic.
UR - http://www.scopus.com/inward/record.url?scp=83455263660&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83455263660&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2011.240
DO - 10.1109/IPDPS.2011.240
M3 - Conference contribution
AN - SCOPUS:83455263660
SN - 9780769543857
T3 - IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
SP - 940
EP - 947
BT - 2011 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2011
T2 - 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011
Y2 - 16 May 2011 through 20 May 2011
ER -