Middleware for data mining applications on clusters and grids

Leonid Glimcher, Ruoming Jin, Gagan Agrawal

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

This paper gives an overview of two middleware systems that have been developed over the last 6 years to address the challenges involved in developing parallel and distributed implementations of data mining algorithms. FREERIDE (FRamework for Rapid Implementation of Data mining Engines) focuses on data mining in a cluster environment. FREERIDE is based on the observation that parallel versions of several well-known data mining techniques share a relatively similar structure, and can be parallelized by dividing the data instances (or records or transactions) among the nodes. The computation on each node involves reading the data instances in an arbitrary order, processing each data instance, and performing a local reduction. The reduction involves only commutative and associative operations, which means the result is independent of the order in which the data instances are processed. After the local reduction on each node, a global reduction is performed. This similarity in the structure can be exploited by the middleware system to execute the data mining tasks efficiently in parallel, starting from a relatively high-level specification of the technique. To enable processing of data sets stored in remote data repositories, we have extended FREERIDE middleware into FREERIDE-G (FRamework for Rapid Implementation of Data mining Engines in Grid). FREERIDE-G supports a high-level interface for developing data mining and scientific data processing applications that involve data stored in remote repositories. The added functionality in FREERIDE-G aims at abstracting the details of remote data retrieval, movements, and caching from application developers.

Original languageEnglish (US)
Pages (from-to)37-53
Number of pages17
JournalJournal of Parallel and Distributed Computing
Volume68
Issue number1
DOIs
StatePublished - Jan 2008
Externally publishedYes

Keywords

  • Clusters
  • Data mining
  • Grids
  • Middleware

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Middleware for data mining applications on clusters and grids'. Together they form a unique fingerprint.

Cite this