Impact of data distribution, level of parallelism, and communication frequency on parallel data cube construction

Ge Yang, Ruoming Jin, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. We have developed a set of parallel algorithms for data cube construction using a new data structure called aggregation tree. Our experience has shown that a number of performance trade-offs arise in developing a parallel data cube implementation. We focus on three important issues, which are: (1) data distribution, i.e., how the original array is distributed among the processors; (2) level of parallelism, i.e., what parts of the computation are parallelized and sequentialized; and (3) frequency of communication, i.e., does the implementation require frequent interprocessor communication (and less memory) or less frequent communication (and more memory). We present a detailed experimental study evaluating the above trade-offs. We consider parallel data cube construction with different cube sizes and sparsity levels. Our experimental results show the following: (1) In all cases, reducing the frequency of communication and using higher memory gave better performance, though the difference was relatively small. (2) Choosing data distribution to minimize communication volume made a substantial difference in the performance in most of the cases. (3) Finally, using parallelism at all levels gave better performance, even though it increases the total communication volume.

Original languageEnglish (US)
Title of host publicationProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)0769519261, 9780769519265
DOIs
StatePublished - 2003
Externally publishedYes
EventInternational Parallel and Distributed Processing Symposium, IPDPS 2003 - Nice, France
Duration: Apr 22 2003Apr 26 2003

Publication series

NameProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003

Conference

ConferenceInternational Parallel and Distributed Processing Symposium, IPDPS 2003
Country/TerritoryFrance
CityNice
Period4/22/034/26/03

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science
  • Software

Fingerprint

Dive into the research topics of 'Impact of data distribution, level of parallelism, and communication frequency on parallel data cube construction'. Together they form a unique fingerprint.

Cite this