Runtime fault-handling for job-flow management in Grid environments

Gargi Dasgupta; Onyeka Ezenwoye; Liana Fong; Selim Kalayci; S. Masoud Sadjadi; Balaji Viswanathan

doi:10.1109/ICAC.2008.16

Runtime fault-handling for job-flow management in Grid environments

Gargi Dasgupta, Onyeka Ezenwoye, Liana Fong, Selim Kalayci, S. Masoud Sadjadi, Balaji Viswanathan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.

Original language	English (US)
Title of host publication	5th International Conference on Autonomic Computing, ICAC 2008
Pages	201-202
Number of pages	2
DOIs	https://doi.org/10.1109/ICAC.2008.16
State	Published - 2008
Externally published	Yes
Event	5th International Conference on Autonomic Computing, ICAC 2008 - Chicago, IL, United States Duration: Jun 2 2008 → Jun 6 2008

Publication series

Name	5th International Conference on Autonomic Computing, ICAC 2008

Other

Other	5th International Conference on Autonomic Computing, ICAC 2008
Country/Territory	United States
City	Chicago, IL
Period	6/2/08 → 6/6/08

Keywords

Fault-tolerance
Generic proxy
Job-flow management
Job-flows
Meta-scheduler

ASJC Scopus subject areas

Computer Networks and Communications
Hardware and Architecture
Software
Control and Systems Engineering

Access to Document

10.1109/ICAC.2008.16

Cite this

Dasgupta, G, Ezenwoye, O, Fong, L, Kalayci, S, Sadjadi, SM & Viswanathan, B 2008, Runtime fault-handling for job-flow management in Grid environments. in 5th International Conference on Autonomic Computing, ICAC 2008., 4550843, 5th International Conference on Autonomic Computing, ICAC 2008, pp. 201-202, 5th International Conference on Autonomic Computing, ICAC 2008, Chicago, IL, United States, 6/2/08. https://doi.org/10.1109/ICAC.2008.16

@inproceedings{cd2d636cea02488bb537fe9a9d8ffdc7,

title = "Runtime fault-handling for job-flow management in Grid environments",

abstract = "The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.",

keywords = "Fault-tolerance, Generic proxy, Job-flow management, Job-flows, Meta-scheduler",

author = "Gargi Dasgupta and Onyeka Ezenwoye and Liana Fong and Selim Kalayci and Sadjadi, {S. Masoud} and Balaji Viswanathan",

year = "2008",

doi = "10.1109/ICAC.2008.16",

language = "English (US)",

isbn = "9780769531755",

series = "5th International Conference on Autonomic Computing, ICAC 2008",

pages = "201--202",

booktitle = "5th International Conference on Autonomic Computing, ICAC 2008",

note = "5th International Conference on Autonomic Computing, ICAC 2008 ; Conference date: 02-06-2008 Through 06-06-2008",

}

TY - GEN

T1 - Runtime fault-handling for job-flow management in Grid environments

AU - Dasgupta, Gargi

AU - Ezenwoye, Onyeka

AU - Fong, Liana

AU - Kalayci, Selim

AU - Sadjadi, S. Masoud

AU - Viswanathan, Balaji

PY - 2008

Y1 - 2008

N2 - The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.

AB - The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.

KW - Fault-tolerance

KW - Generic proxy

KW - Job-flow management

KW - Job-flows

KW - Meta-scheduler

UR - http://www.scopus.com/inward/record.url?scp=51649115579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51649115579&partnerID=8YFLogxK

U2 - 10.1109/ICAC.2008.16

DO - 10.1109/ICAC.2008.16

M3 - Conference contribution

AN - SCOPUS:51649115579

SN - 9780769531755

T3 - 5th International Conference on Autonomic Computing, ICAC 2008

SP - 201

EP - 202

BT - 5th International Conference on Autonomic Computing, ICAC 2008

T2 - 5th International Conference on Autonomic Computing, ICAC 2008

Y2 - 2 June 2008 through 6 June 2008

ER -

Runtime fault-handling for job-flow management in Grid environments

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this