Supporting load balancing for distributed data-intensive applications

Leonid Glimcher; Vignesh T. Ravi; Gagan Agrawal

doi:10.1109/HIPC.2009.5433204

Supporting load balancing for distributed data-intensive applications

Leonid Glimcher, Vignesh T. Ravi, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Scopus citations

Abstract

In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.

Original language	English (US)
Title of host publication	16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
Pages	235-244
Number of pages	10
DOIs	https://doi.org/10.1109/HIPC.2009.5433204
State	Published - 2009
Externally published	Yes
Event	16th International Conference on High Performance Computing, HiPC 2009 - Kochi, India Duration: Dec 16 2009 → Dec 19 2009

Publication series

Name	16th International Conference on High Performance Computing, HiPC 2009 - Proceedings

Conference

Conference	16th International Conference on High Performance Computing, HiPC 2009
Country/Territory	India
City	Kochi
Period	12/16/09 → 12/19/09

ASJC Scopus subject areas

Computational Theory and Mathematics
Theoretical Computer Science

Access to Document

10.1109/HIPC.2009.5433204

Cite this

Supporting load balancing for distributed data-intensive applications. / Glimcher, Leonid; Ravi, Vignesh T.; Agrawal, Gagan.
16th International Conference on High Performance Computing, HiPC 2009 - Proceedings. 2009. p. 235-244 5403204 (16th International Conference on High Performance Computing, HiPC 2009 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Glimcher, L, Ravi, VT & Agrawal, G 2009, Supporting load balancing for distributed data-intensive applications. in 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings., 5403204, 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings, pp. 235-244, 16th International Conference on High Performance Computing, HiPC 2009, Kochi, India, 12/16/09. https://doi.org/10.1109/HIPC.2009.5433204

@inproceedings{a550daa592ee4848b0976394c154717c,

title = "Supporting load balancing for distributed data-intensive applications",

abstract = "In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.",

author = "Leonid Glimcher and Ravi, {Vignesh T.} and Gagan Agrawal",

year = "2009",

doi = "10.1109/HIPC.2009.5433204",

language = "English (US)",

isbn = "9781424449224",

series = "16th International Conference on High Performance Computing, HiPC 2009 - Proceedings",

pages = "235--244",

booktitle = "16th International Conference on High Performance Computing, HiPC 2009 - Proceedings",

note = "16th International Conference on High Performance Computing, HiPC 2009 ; Conference date: 16-12-2009 Through 19-12-2009",

}

TY - GEN

T1 - Supporting load balancing for distributed data-intensive applications

AU - Glimcher, Leonid

AU - Ravi, Vignesh T.

AU - Agrawal, Gagan

PY - 2009

Y1 - 2009

N2 - In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.

AB - In data-intensive computing, an important problem that has received relatively little attention is of transparent processing of data stored in remote data repositories. Interesting load balancing considerations arise for these scenarios. Particularly, based on where data is generated and how it is shared, a dataset of interest can be divided across multiple data repositories, which may be geographically distributed and the data may be partitioned in a number of ways. This paper focuses on enabling such distributed processing of data from distributed resources. We have developed a load balancing algorithm, which minimizes the total time spent on processing the data. We consider weighted sum of two factors, a load balancing factor and a term that captures the amount of time spent by processing nodes waiting for the data. Our solutions have been implemented and evaluated in the context of FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have extensively evaluated our techniques using two data-intensive applications.

UR - http://www.scopus.com/inward/record.url?scp=77952227539&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952227539&partnerID=8YFLogxK

U2 - 10.1109/HIPC.2009.5433204

DO - 10.1109/HIPC.2009.5433204

M3 - Conference contribution

AN - SCOPUS:77952227539

SN - 9781424449224

T3 - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings

SP - 235

EP - 244

BT - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings

T2 - 16th International Conference on High Performance Computing, HiPC 2009

Y2 - 16 December 2009 through 19 December 2009

ER -

Supporting load balancing for distributed data-intensive applications

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this