TY - GEN
T1 - Robust network supercomputing without centralized control
AU - Davtyan, Seda
AU - Konwar, Kishori M.
AU - Shvartsman, Alexander A.
PY - 2011
Y1 - 2011
N2 - Traditional approaches to network supercomputing employ a master process and a large number of potentially undependable worker processes that must perform a collection of tasks on behalf of the master. In such a centralized scheme, the master process is a performance bottleneck and a single point of failure. This work develops an original approach that eliminates the master and instead uses a decentralized algorithm, where each worker is able to determine locally that all tasks have been performed, and to collect locally the results of all tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. A randomized synchronous algorithm for n processes and n tasks is presented. The algorithm terminates in θ(log n) rounds, and it is proved that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is θ(n log n), and the bit complexity is O(n2 log3 n).
AB - Traditional approaches to network supercomputing employ a master process and a large number of potentially undependable worker processes that must perform a collection of tasks on behalf of the master. In such a centralized scheme, the master process is a performance bottleneck and a single point of failure. This work develops an original approach that eliminates the master and instead uses a decentralized algorithm, where each worker is able to determine locally that all tasks have been performed, and to collect locally the results of all tasks. The failure model assumes that the average probability of a worker returning a wrong result is inferior to 1/2. A randomized synchronous algorithm for n processes and n tasks is presented. The algorithm terminates in θ(log n) rounds, and it is proved that upon termination the workers know the results of all tasks with high probability, and that these results are correct with high probability. The message complexity of the algorithm is θ(n log n), and the bit complexity is O(n2 log3 n).
KW - distributed algorithms
KW - fault-tolerance
KW - internet supercomputing
KW - randomized algorithms
UR - http://www.scopus.com/inward/record.url?scp=79959897254&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959897254&partnerID=8YFLogxK
U2 - 10.1145/1993806.1993860
DO - 10.1145/1993806.1993860
M3 - Conference contribution
AN - SCOPUS:79959897254
SN - 9781450307192
T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing
SP - 293
EP - 294
BT - PODC'11 - Proceedings of the 2011 ACM Symposium Principles of Distributed Computing
T2 - 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC'11, Held as Part of the 5th Federated Computing Research Conference, FCRC
Y2 - 6 June 2011 through 8 June 2011
ER -