Robust network supercomputing without centralized control

Seda Davtyan; Kishori M. Konwar; Alexander A. Shvartsman

doi:10.1145/1993806.1993860

Robust network supercomputing without centralized control

Seda Davtyan, Kishori M. Konwar, Alexander A. Shvartsman

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Traditional approaches to network supercomputing employ a master process and a large number of potentially undependable worker processes that must perform a collection of tasks on behalf of the master. In such a centralized scheme, the master process is a performance bottleneck and a single point of failure. This work develops an original approach that eliminates the master and instead uses a decentralized algorithm, where each worker is able to determine locally that all tasks have been performed, and to collect locally the results of all tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. A randomized synchronous algorithm for n processes and n tasks is presented. The algorithm terminates in θ(log n) rounds, and it is proved that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is θ(n log n), and the bit complexity is O(n² log³ n).

Original language	English (US)
Title of host publication	PODC'11 - Proceedings of the 2011 ACM Symposium Principles of Distributed Computing
Pages	293-294
Number of pages	2
DOIs	https://doi.org/10.1145/1993806.1993860
State	Published - 2011
Externally published	Yes
Event	30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC'11, Held as Part of the 5th Federated Computing Research Conference, FCRC - San Jose, CA, United States Duration: Jun 6 2011 → Jun 8 2011

Publication series

Name	Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

Conference

Conference	30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC'11, Held as Part of the 5th Federated Computing Research Conference, FCRC
Country/Territory	United States
City	San Jose, CA
Period	6/6/11 → 6/8/11

Keywords

distributed algorithms
fault-tolerance
internet supercomputing
randomized algorithms

ASJC Scopus subject areas

Software
Hardware and Architecture
Computer Networks and Communications

Access to Document

10.1145/1993806.1993860

Cite this

Robust network supercomputing without centralized control. / Davtyan, Seda; Konwar, Kishori M.; Shvartsman, Alexander A.
PODC'11 - Proceedings of the 2011 ACM Symposium Principles of Distributed Computing. 2011. p. 293-294 (Proceedings of the Annual ACM Symposium on Principles of Distributed Computing).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Davtyan, S, Konwar, KM & Shvartsman, AA 2011, Robust network supercomputing without centralized control. in PODC'11 - Proceedings of the 2011 ACM Symposium Principles of Distributed Computing. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, pp. 293-294, 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC'11, Held as Part of the 5th Federated Computing Research Conference, FCRC, San Jose, CA, United States, 6/6/11. https://doi.org/10.1145/1993806.1993860

@inproceedings{50737b04943744d799df84ae5c0aa2c7,

title = "Robust network supercomputing without centralized control",

abstract = "Traditional approaches to network supercomputing employ a master process and a large number of potentially undependable worker processes that must perform a collection of tasks on behalf of the master. In such a centralized scheme, the master process is a performance bottleneck and a single point of failure. This work develops an original approach that eliminates the master and instead uses a decentralized algorithm, where each worker is able to determine locally that all tasks have been performed, and to collect locally the results of all tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. A randomized synchronous algorithm for n processes and n tasks is presented. The algorithm terminates in θ(log n) rounds, and it is proved that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is θ(n log n), and the bit complexity is O(n2 log3 n).",

keywords = "distributed algorithms, fault-tolerance, internet supercomputing, randomized algorithms",

author = "Seda Davtyan and Konwar, {Kishori M.} and Shvartsman, {Alexander A.}",

year = "2011",

doi = "10.1145/1993806.1993860",

language = "English (US)",

isbn = "9781450307192",

series = "Proceedings of the Annual ACM Symposium on Principles of Distributed Computing",

pages = "293--294",

booktitle = "PODC'11 - Proceedings of the 2011 ACM Symposium Principles of Distributed Computing",

note = "30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC'11, Held as Part of the 5th Federated Computing Research Conference, FCRC ; Conference date: 06-06-2011 Through 08-06-2011",

}

TY - GEN

T1 - Robust network supercomputing without centralized control

AU - Davtyan, Seda

AU - Konwar, Kishori M.

AU - Shvartsman, Alexander A.

PY - 2011

Y1 - 2011

N2 - Traditional approaches to network supercomputing employ a master process and a large number of potentially undependable worker processes that must perform a collection of tasks on behalf of the master. In such a centralized scheme, the master process is a performance bottleneck and a single point of failure. This work develops an original approach that eliminates the master and instead uses a decentralized algorithm, where each worker is able to determine locally that all tasks have been performed, and to collect locally the results of all tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. A randomized synchronous algorithm for n processes and n tasks is presented. The algorithm terminates in θ(log n) rounds, and it is proved that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is θ(n log n), and the bit complexity is O(n2 log3 n).

AB - Traditional approaches to network supercomputing employ a master process and a large number of potentially undependable worker processes that must perform a collection of tasks on behalf of the master. In such a centralized scheme, the master process is a performance bottleneck and a single point of failure. This work develops an original approach that eliminates the master and instead uses a decentralized algorithm, where each worker is able to determine locally that all tasks have been performed, and to collect locally the results of all tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. A randomized synchronous algorithm for n processes and n tasks is presented. The algorithm terminates in θ(log n) rounds, and it is proved that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is θ(n log n), and the bit complexity is O(n2 log3 n).

KW - distributed algorithms

KW - fault-tolerance

KW - internet supercomputing

KW - randomized algorithms

UR - http://www.scopus.com/inward/record.url?scp=79959897254&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959897254&partnerID=8YFLogxK

U2 - 10.1145/1993806.1993860

DO - 10.1145/1993806.1993860

M3 - Conference contribution

AN - SCOPUS:79959897254

SN - 9781450307192

T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

SP - 293

EP - 294

BT - PODC'11 - Proceedings of the 2011 ACM Symposium Principles of Distributed Computing

T2 - 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC'11, Held as Part of the 5th Federated Computing Research Conference, FCRC

Y2 - 6 June 2011 through 8 June 2011

ER -

Robust network supercomputing without centralized control

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this