A framework for data-intensive computing with cloud bursting

Tekin Bicer, David Chiu, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Scopus citations

Abstract

For many organizations, one attractive use of cloud resources can be through what is referred to as cloud bursting or the hybrid cloud. These refer to scenarios where an organization acquires and manages in-house resources to meet its base need, but can use additional resources from a cloud provider to maintain an acceptable response time during workload peaks. Cloud bursting has so far been discussed in the context of using additional computing resources from a cloud provider. However, as next generation applications are expected to see orders of magnitude increase in data set sizes, cloud resources can be used to store additional data after local resources are exhausted. In this paper, we consider the challenge of data analysis in a scenario where data is stored across a local cluster and cloud resources. We describe a software framework to enable data-intensive computing with cloud bursting, i.e., using a combination of compute resources from a local cluster and a cloud environment to perform Map-Reduce type processing on a data set that is geographically distributed. Our evaluation with three different applications shows that data-intensive computing with cloud bursting is feasible and scalable. Particularly, as compared to a situation where the data set is stored at one location and processed using resources at that end, the average slowdown of our system (using distributed but the same aggregate number of compute resources), is only 15.55%. Thus, the overheads due to global reduction, remote data retrieval, and potential load imbalance are quite manageable. Our system scales with an average speedup of 81% when the number of compute resources is doubled.

Original languageEnglish (US)
Title of host publicationProceedings - 2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Pages169-177
Number of pages9
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 IEEE International Conference on Cluster Computing, CLUSTER 2011 - Austin, TX, United States
Duration: Sep 26 2011Sep 30 2011

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Country/TerritoryUnited States
CityAustin, TX
Period9/26/119/30/11

Keywords

  • Cloud bursting
  • Data-intensive computing
  • Map-reduce

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'A framework for data-intensive computing with cloud bursting'. Together they form a unique fingerprint.

Cite this