Performing tasks on synchronous restartable message-passing processors

Bogdan S. Chlebus; Roberto De Prisco; Alex A. Shvartsman

doi:10.1007/PL00008926

Performing tasks on synchronous restartable message-passing processors

Bogdan S. Chlebus, Roberto De Prisco, Alex A. Shvartsman

Research output: Contribution to journal › Article › peer-review

40 Scopus citations

Abstract

This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called DO-ALL herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < stop-failures and does not allow restarts. Its available processor steps (work) complexity is S = O((t + p log p/log log p) · log f) and its message complexity is M = O(t + p log p/ log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the DO-ALL problem that efficiently deals with processor restarts. Its available processor steps is S = O((t + p log p + f) · min{log p, log f}), and its message complexity is M = O(t + p log p + fp), where f is the total number of failures.

Original language	English (US)
Article number	1
Pages (from-to)	49-64
Number of pages	16
Journal	Distributed Computing
Volume	14
Issue number	1
DOIs	https://doi.org/10.1007/PL00008926
State	Published - 2001
Externally published	Yes

Keywords

Distributed systems
Fault-tolerance
Load balancing
Processor restarts
Work

ASJC Scopus subject areas

Theoretical Computer Science
Hardware and Architecture
Computer Networks and Communications
Computational Theory and Mathematics

Access to Document

10.1007/PL00008926

https://dblp.org/db/journals/dc/dc14.html#ChlebusPS01

Cite this

@article{0cdac47683cb4998a6eeed54e2ec5389,

title = "Performing tasks on synchronous restartable message-passing processors",

abstract = "This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called DO-ALL herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < stop-failures and does not allow restarts. Its available processor steps (work) complexity is S = O((t + p log p/log log p) · log f) and its message complexity is M = O(t + p log p/ log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the DO-ALL problem that efficiently deals with processor restarts. Its available processor steps is S = O((t + p log p + f) · min{log p, log f}), and its message complexity is M = O(t + p log p + fp), where f is the total number of failures.",

keywords = "Distributed systems, Fault-tolerance, Load balancing, Processor restarts, Work",

author = "Chlebus, {Bogdan S.} and {De Prisco}, Roberto and Shvartsman, {Alex A.}",

note = "DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",

year = "2001",

doi = "10.1007/PL00008926",

language = "English (US)",

volume = "14",

pages = "49--64",

journal = "Distributed Computing",

issn = "0178-2770",

publisher = "Springer Verlag",

number = "1",

}

TY - JOUR

T1 - Performing tasks on synchronous restartable message-passing processors

AU - Chlebus, Bogdan S.

AU - De Prisco, Roberto

AU - Shvartsman, Alex A.

N1 - DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2001

Y1 - 2001

N2 - This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called DO-ALL herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < stop-failures and does not allow restarts. Its available processor steps (work) complexity is S = O((t + p log p/log log p) · log f) and its message complexity is M = O(t + p log p/ log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the DO-ALL problem that efficiently deals with processor restarts. Its available processor steps is S = O((t + p log p + f) · min{log p, log f}), and its message complexity is M = O(t + p log p + fp), where f is the total number of failures.

AB - This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called DO-ALL herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < stop-failures and does not allow restarts. Its available processor steps (work) complexity is S = O((t + p log p/log log p) · log f) and its message complexity is M = O(t + p log p/ log log p + fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the DO-ALL problem that efficiently deals with processor restarts. Its available processor steps is S = O((t + p log p + f) · min{log p, log f}), and its message complexity is M = O(t + p log p + fp), where f is the total number of failures.

KW - Distributed systems

KW - Fault-tolerance

KW - Load balancing

KW - Processor restarts

KW - Work

UR - http://www.scopus.com/inward/record.url?scp=0034893155&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034893155&partnerID=8YFLogxK

U2 - 10.1007/PL00008926

DO - 10.1007/PL00008926

M3 - Article

AN - SCOPUS:0034893155

SN - 0178-2770

VL - 14

SP - 49

EP - 64

JO - Distributed Computing

JF - Distributed Computing

IS - 1

M1 - 1

ER -

Performing tasks on synchronous restartable message-passing processors

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this