TY - GEN
T1 - Design and evaluation of a high-level interface for data mining
AU - Jin, Ruoming
AU - Agrawal, G.
N1 - Funding Information:
This work was supported by NSF grant ACR-9982087, NSF CAREER award ACI-9733520, and NSF grant ACR-0130437.
Publisher Copyright:
© 2002 IEEE.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2002
Y1 - 2002
N2 - This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, data mining tasks have become an important application class for high performance computing. Our study of a number of common data mining algorithms has shown that the same set of parallelization techniques can be applied to all of them. To exploit this, we have developed a reduction-object based interface to rapidly specify a shared memory parallel data mining algorithm. The set of parallelization techniques we target include full replication, optimized full locking, and cache-sensitive locking. We show how our runtime system can apply any of these technique starting from a common specification. We have evaluated our high-level interface and the parallelization techniques using apriori association mining and k-means clustering algorithms. Our experimental results show that the overhead of the interface is within 10% and our parallelization techniques scale well.
AB - This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, data mining tasks have become an important application class for high performance computing. Our study of a number of common data mining algorithms has shown that the same set of parallelization techniques can be applied to all of them. To exploit this, we have developed a reduction-object based interface to rapidly specify a shared memory parallel data mining algorithm. The set of parallelization techniques we target include full replication, optimized full locking, and cache-sensitive locking. We show how our runtime system can apply any of these technique starting from a common specification. We have evaluated our high-level interface and the parallelization techniques using apriori association mining and k-means clustering algorithms. Our experimental results show that the overhead of the interface is within 10% and our parallelization techniques scale well.
UR - http://www.scopus.com/inward/record.url?scp=84966667411&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966667411&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2002.1016491
DO - 10.1109/IPDPS.2002.1016491
M3 - Conference contribution
AN - SCOPUS:84966667411
T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002
SP - 106
BT - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th International Parallel and Distributed Processing Symposium, IPDPS 2002
Y2 - 15 April 2002 through 19 April 2002
ER -