Runtime fault-handling for job-flow management in Grid environments

Gargi Dasgupta, Onyeka Ezenwoye, Liana Fong, Selim Kalayci, S. Masoud Sadjadi, Balaji Viswanathan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.

Original languageEnglish (US)
Title of host publication5th International Conference on Autonomic Computing, ICAC 2008
Pages201-202
Number of pages2
DOIs
StatePublished - Sep 18 2008
Externally publishedYes
Event5th International Conference on Autonomic Computing, ICAC 2008 - Chicago, IL, United States
Duration: Jun 2 2008Jun 6 2008

Other

Other5th International Conference on Autonomic Computing, ICAC 2008
CountryUnited States
CityChicago, IL
Period6/2/086/6/08

Fingerprint

Engines
Recovery

Keywords

  • Fault-tolerance
  • Generic proxy
  • Job-flow management
  • Job-flows
  • Meta-scheduler

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software
  • Control and Systems Engineering

Cite this

Dasgupta, G., Ezenwoye, O., Fong, L., Kalayci, S., Sadjadi, S. M., & Viswanathan, B. (2008). Runtime fault-handling for job-flow management in Grid environments. In 5th International Conference on Autonomic Computing, ICAC 2008 (pp. 201-202). [4550843] https://doi.org/10.1109/ICAC.2008.16

Runtime fault-handling for job-flow management in Grid environments. / Dasgupta, Gargi; Ezenwoye, Onyeka; Fong, Liana; Kalayci, Selim; Sadjadi, S. Masoud; Viswanathan, Balaji.

5th International Conference on Autonomic Computing, ICAC 2008. 2008. p. 201-202 4550843.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dasgupta, G, Ezenwoye, O, Fong, L, Kalayci, S, Sadjadi, SM & Viswanathan, B 2008, Runtime fault-handling for job-flow management in Grid environments. in 5th International Conference on Autonomic Computing, ICAC 2008., 4550843, pp. 201-202, 5th International Conference on Autonomic Computing, ICAC 2008, Chicago, IL, United States, 6/2/08. https://doi.org/10.1109/ICAC.2008.16
Dasgupta G, Ezenwoye O, Fong L, Kalayci S, Sadjadi SM, Viswanathan B. Runtime fault-handling for job-flow management in Grid environments. In 5th International Conference on Autonomic Computing, ICAC 2008. 2008. p. 201-202. 4550843 https://doi.org/10.1109/ICAC.2008.16
Dasgupta, Gargi ; Ezenwoye, Onyeka ; Fong, Liana ; Kalayci, Selim ; Sadjadi, S. Masoud ; Viswanathan, Balaji. / Runtime fault-handling for job-flow management in Grid environments. 5th International Conference on Autonomic Computing, ICAC 2008. 2008. pp. 201-202
@inproceedings{cd2d636cea02488bb537fe9a9d8ffdc7,
title = "Runtime fault-handling for job-flow management in Grid environments",
abstract = "The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.",
keywords = "Fault-tolerance, Generic proxy, Job-flow management, Job-flows, Meta-scheduler",
author = "Gargi Dasgupta and Onyeka Ezenwoye and Liana Fong and Selim Kalayci and Sadjadi, {S. Masoud} and Balaji Viswanathan",
year = "2008",
month = "9",
day = "18",
doi = "10.1109/ICAC.2008.16",
language = "English (US)",
isbn = "9780769531755",
pages = "201--202",
booktitle = "5th International Conference on Autonomic Computing, ICAC 2008",

}

TY - GEN

T1 - Runtime fault-handling for job-flow management in Grid environments

AU - Dasgupta, Gargi

AU - Ezenwoye, Onyeka

AU - Fong, Liana

AU - Kalayci, Selim

AU - Sadjadi, S. Masoud

AU - Viswanathan, Balaji

PY - 2008/9/18

Y1 - 2008/9/18

N2 - The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.

AB - The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.

KW - Fault-tolerance

KW - Generic proxy

KW - Job-flow management

KW - Job-flows

KW - Meta-scheduler

UR - http://www.scopus.com/inward/record.url?scp=51649115579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51649115579&partnerID=8YFLogxK

U2 - 10.1109/ICAC.2008.16

DO - 10.1109/ICAC.2008.16

M3 - Conference contribution

AN - SCOPUS:51649115579

SN - 9780769531755

SP - 201

EP - 202

BT - 5th International Conference on Autonomic Computing, ICAC 2008

ER -