Smart-MLlib: A high-performance machine-learning library

David Siegal, Jia Guo, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations


As the popularity of big data analytics has continued to grow, so has the need for accessible and scalable machinelearning implementations. In recent years, Apache Spark's machine-learning library, MLlib, has been used to fulfill this need. Though Spark outperforms Hadoop, it is not clear if it is the best performing underlying middleware to support machine learning implementations. Building on a C++ and MPI based middleware system,-Situ MApReduce liTe (Smart), we present a machine-learning library prototype (Smart-MLlib). Like MLlib, Smart MLlib allows machine learning implementations to be invoked from a Scala program, and with a very similar API. To test our library's performance, we built four machine-learning applications that are also provided in Spark's MLlib: k-means clustering, linear regression, Gaussian mixture models, and support vector machines. On average, we outperformed Spark's MLlib by over 800%. Our library also scaled better than Spark's MLlib for every application tested. Thus, the new machinelearning library enables higher performance than Spark's MLlib without sacrificing the easy-to-use API.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9781509036530
StatePublished - Dec 6 2016
Externally publishedYes
Event2016 IEEE International Conference on Cluster Computing, CLUSTER 2016 - Taipei, Taiwan, Province of China
Duration: Sep 13 2016Sep 15 2016

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244


Conference2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
Country/TerritoryTaiwan, Province of China

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing


Dive into the research topics of 'Smart-MLlib: A high-performance machine-learning library'. Together they form a unique fingerprint.

Cite this