TY - GEN
T1 - Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware
AU - Ramanathan, Venkatram
AU - Ma, Wenjing
AU - Ravi, Vignesh T.
AU - Liu, Tantan
AU - Agrawal, Gagan
PY - 2010
Y1 - 2010
N2 - The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.
AB - The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.
UR - http://www.scopus.com/inward/record.url?scp=79951804787&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951804787&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2010.100
DO - 10.1109/ICDMW.2010.100
M3 - Conference contribution
AN - SCOPUS:79951804787
SN - 9780769542577
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 186
EP - 193
BT - Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
T2 - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Y2 - 14 December 2010 through 17 December 2010
ER -