Design and evaluation of a high-level interface for data mining

Ruoming Jin, G. Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, data mining tasks have become an important application class for high performance computing. Our study of a number of common data mining algorithms has shown that the same set of parallelization techniques can be applied to all of them. To exploit this, we have developed a reduction-object based interface to rapidly specify a shared memory parallel data mining algorithm. The set of parallelization techniques we target include full replication, optimized full locking, and cache-sensitive locking. We show how our runtime system can apply any of these technique starting from a common specification. We have evaluated our high-level interface and the parallelization techniques using apriori association mining and k-means clustering algorithms. Our experimental results show that the overhead of the interface is within 10% and our parallelization techniques scale well.

Original languageEnglish (US)
Title of host publicationProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages106
Number of pages1
ISBN (Electronic)0769515738, 9780769515731
DOIs
StatePublished - 2002
Event16th International Parallel and Distributed Processing Symposium, IPDPS 2002 - Ft. Lauderdale, United States
Duration: Apr 15 2002Apr 19 2002

Publication series

NameProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002

Conference

Conference16th International Parallel and Distributed Processing Symposium, IPDPS 2002
Country/TerritoryUnited States
CityFt. Lauderdale
Period4/15/024/19/02

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Design and evaluation of a high-level interface for data mining'. Together they form a unique fingerprint.

Cite this