TY - GEN
T1 - Implementing data cube construction using a cluster middleware
T2 - 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2002
AU - Yang, Ge
AU - Jin, Ruoming
AU - Agrawal, Gagan
PY - 2002
Y1 - 2002
N2 - With increases in the amount of data available for analysis in commercial settings, On Line Analytical Processing (OLAP) and decision support have become important applications for high performance computing. Implementing such applications on clusters requires a lot of expertise and effort, particularly because of the sizes of input and outputdatasets. In this paper, we describe our experiences in developing one such application using a cluster middleware, called ADR. We focus on the problem of data cube construction, which commonly arises in multi-dimensional OLAP. We show how ADR, originally developed for scientific data intensive applications, can be used for carrying out an efficient and scalable data cube construction implementation. A particular issue with the use of ADR is tiling of output datasets. We present new algorithms that combine inter-processor communication and tiling within each processor. These algorithms preserve the important properties that are desirable from any parallel data cube construction algorithm. We have carried out a detailed evaluation of our implementation. The main results from our experiments are as follows: 1) High speedups are achieved on both dense and sparse datasets, even though we have used simple algorithms that sequentialize a part of the computation, 2) The execution time depends only upon the amount of computation, and does not increase in a super-linear fashion as the dataset size or the number of tiles increases, and 3) As the datasets become more sparse, sequential performance degrades, but the parallel speedups are still quite good.
AB - With increases in the amount of data available for analysis in commercial settings, On Line Analytical Processing (OLAP) and decision support have become important applications for high performance computing. Implementing such applications on clusters requires a lot of expertise and effort, particularly because of the sizes of input and outputdatasets. In this paper, we describe our experiences in developing one such application using a cluster middleware, called ADR. We focus on the problem of data cube construction, which commonly arises in multi-dimensional OLAP. We show how ADR, originally developed for scientific data intensive applications, can be used for carrying out an efficient and scalable data cube construction implementation. A particular issue with the use of ADR is tiling of output datasets. We present new algorithms that combine inter-processor communication and tiling within each processor. These algorithms preserve the important properties that are desirable from any parallel data cube construction algorithm. We have carried out a detailed evaluation of our implementation. The main results from our experiments are as follows: 1) High speedups are achieved on both dense and sparse datasets, even though we have used simple algorithms that sequentialize a part of the computation, 2) The execution time depends only upon the amount of computation, and does not increase in a super-linear fashion as the dataset size or the number of tiles increases, and 3) As the datasets become more sparse, sequential performance degrades, but the parallel speedups are still quite good.
UR - http://www.scopus.com/inward/record.url?scp=84887987581&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887987581&partnerID=8YFLogxK
U2 - 10.1109/CCGRID.2002.1017115
DO - 10.1109/CCGRID.2002.1017115
M3 - Conference contribution
AN - SCOPUS:84887987581
SN - 0769515827
SN - 9780769515823
T3 - 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2002
BT - 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2002
Y2 - 21 May 2002 through 24 May 2002
ER -