Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware

Venkatram Ramanathan; Wenjing Ma; Vignesh T. Ravi; Tantan Liu; Gagan Agrawal

doi:10.1109/ICDMW.2010.100

Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware

Venkatram Ramanathan, Wenjing Ma, Vignesh T. Ravi, Tantan Liu, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.

Original language	English (US)
Title of host publication	Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Pages	186-193
Number of pages	8
DOIs	https://doi.org/10.1109/ICDMW.2010.100
State	Published - 2010
Externally published	Yes
Event	10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 - Sydney, NSW, Australia Duration: Dec 14 2010 → Dec 17 2010

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)	1550-4786

Conference

Conference	10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Country/Territory	Australia
City	Sydney, NSW
Period	12/14/10 → 12/17/10

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/ICDMW.2010.100

Cite this

Ramanathan, V., Ma, W., Ravi, V. T., Liu, T., & Agrawal, G. (2010). Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware. In Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 (pp. 186-193). Article 5693299 (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDMW.2010.100

Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware. / Ramanathan, Venkatram; Ma, Wenjing; Ravi, Vignesh T. et al.
Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010. 2010. p. 186-193 5693299 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ramanathan, V, Ma, W, Ravi, VT, Liu, T & Agrawal, G 2010, Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware. in Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010., 5693299, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 186-193, 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010, Sydney, NSW, Australia, 12/14/10. https://doi.org/10.1109/ICDMW.2010.100

@inproceedings{a5e2ab82e40e4637b0eafc1849b629e7,

title = "Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware",

abstract = "The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.",

author = "Venkatram Ramanathan and Wenjing Ma and Ravi, {Vignesh T.} and Tantan Liu and Gagan Agrawal",

year = "2010",

doi = "10.1109/ICDMW.2010.100",

language = "English (US)",

isbn = "9780769542577",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

pages = "186--193",

booktitle = "Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010",

note = "10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 ; Conference date: 14-12-2010 Through 17-12-2010",

}

TY - GEN

T1 - Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware

AU - Ramanathan, Venkatram

AU - Ma, Wenjing

AU - Ravi, Vignesh T.

AU - Liu, Tantan

AU - Agrawal, Gagan

PY - 2010

Y1 - 2010

N2 - The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.

AB - The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Datamining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.

UR - http://www.scopus.com/inward/record.url?scp=79951804787&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951804787&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2010.100

DO - 10.1109/ICDMW.2010.100

M3 - Conference contribution

AN - SCOPUS:79951804787

SN - 9780769542577

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 186

EP - 193

BT - Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010

T2 - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010

Y2 - 14 December 2010 through 17 December 2010

ER -

Parallelizing an information theoretic Co-clustering algorithm using a cloud middleware

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this