Efficient parallel algorithms on restartable fail-stop processors

Paris C. Kanellakis, Alex A. Shvartsman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study efficient deterministic executions of parallel algorithms on restartable fail-stop CRCW PRAMs. We allow the PRAM processors to be subject to arbitrary stop failures and restarts, that are determined by an on-line adversary, and that result in loss of private memory but do not affect shared memory. For this model, we define and justify the complexity measures of: completed work, where processors are charged for completed fixed-size update cycles, and overhead ratio, which amortizes the work over necessary work and failures. This framework is a nontrivial extension of the fail-stop no-restart model of [KS 89]. We present a simulation strategy for any N processor PRAM on a restartable fail-stop P processor CRCW PRAM such that: it guarantees a terminating execution of each simulated N processor step, with 0(log2 N) overhead ratio, and 0(min{A'' + Plog2 N + Mlog JV, N P0'8}) (sub-quadratic) completed work, where M is the number of failures during this step's simulation. This strategy is work-optimal when the number of simulating processors is P < N/log2 N and the total number of failures per each simulated N processor step is 0(N/logN). These results are based on a new algorithm for the Write-All problem "P processors write l's in an array of size N", together with a modification of the main algorithm of [KS 89] and with the techniques in [KPS 90, Shv 89]. We observe that, on P = N restartable fail-stop processors, the Write-All problem requires Q(N/ogN) completed work, and this lower bound holds even under the additional assumption that processors can read and locally process the entire shared memory at unit cost. Under this unrealistic assumption we have a matching upper bound. The lower bound also applies to the expected completed work of randomized algorithms that are subject to on-line adversaries. Finally, we desribe a simple on-line adversary that causes inefficiency in many randomized algorithms.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual ACM Symposium on Principles of Distributed Computing
PublisherAssociation for Computing Machinery
Pages23-36
Number of pages14
ISBN (Print)0897914392
DOIs
StatePublished - Jul 1 1991
Externally publishedYes
Event10th Annual ACM Symposium on Principles of Distributed Computing, PODC 1991 - Montreal, Canada
Duration: Aug 19 1991Aug 21 1991

Publication series

NameProceedings of the Annual ACM Symposium on Principles of Distributed Computing

Conference

Conference10th Annual ACM Symposium on Principles of Distributed Computing, PODC 1991
Country/TerritoryCanada
CityMontreal
Period8/19/918/21/91

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Efficient parallel algorithms on restartable fail-stop processors'. Together they form a unique fingerprint.

Cite this