Comparing map-reduce and FREERIDE for data-intensive applications

Wei Jiang, Vignesh T. Ravi, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Map-reduce has been a topic of much interest in the last 2-3 years. While it is well accepted that the map-reduce APIs enable significantly easier programming, the performance aspects of the use of map-reduce are less well understood. This paper focuses on comparing the map-reduce paradigm with a system that was developed earlier at Ohio State, FREERIDE (FRamework for Rapid Implementation of Datamining Engines). The API and the functionality offered by FREERIDE has many similarities with the map-reduce API. However, there are some differences in the API. Moreover, while FREERIDE was motivated by data mining computations, map-reduce was motivated by searching, sorting, and related applications in a data-center. We compare the programming APIs and performance of the Hadoop implementation of mapreduce with FREERIDE. For our study, we have taken three data mining algorithms, which are k-means clustering, apriori association mining, and k-nearest neighbor search. We have also included a simple data scanning application, word-count. The main observations from our results are as follows. For the three data mining applications we have considered, FREERIDE outperformed Hadoop by a factor of 5 or more. For word-count, Hadoop is better by a factor of up to 2. With increasing dataset sizes, the relative performance of Hadoop becomes better. Overall, it seems that Hadoop has significant overheads related to initialization, I/O, and sorting of (key, value) pairs. Thus, despite an easy to program API, Hadoop's map-reduce does not appear very suitable for data mining computations on modest-sized datasets.

Original languageEnglish (US)
Title of host publication2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER '09
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER '09 - New Orleans, LA, United States
Duration: Aug 31 2009Sep 4 2009

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference2009 IEEE International Conference on Cluster Computing and Workshops, CLUSTER '09
CountryUnited States
CityNew Orleans, LA
Period8/31/099/4/09

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint Dive into the research topics of 'Comparing map-reduce and FREERIDE for data-intensive applications'. Together they form a unique fingerprint.

Cite this