Servicing range queries on multidimensional datasets with partial replicas

Li Weng; Umit Catalyurek; Tahsin Kurc; Gagan Agrawal; Joel Saltz

doi:10.1109/CCGRID.2005.1558635

Servicing range queries on multidimensional datasets with partial replicas

Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

10 Scopus citations

Abstract

Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.

Original language	English (US)
Title of host publication	2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Pages	726-733
Number of pages	8
DOIs	https://doi.org/10.1109/CCGRID.2005.1558635
State	Published - 2005
Externally published	Yes
Event	2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005 - Cardiff, Wales, United Kingdom Duration: May 9 2005 → May 12 2005

Publication series

Name	2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Volume	2

Conference

Conference	2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005
Country/Territory	United Kingdom
City	Cardiff, Wales
Period	5/9/05 → 5/12/05

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/CCGRID.2005.1558635

Cite this

Weng, L., Catalyurek, U., Kurc, T., Agrawal, G., & Saltz, J. (2005). Servicing range queries on multidimensional datasets with partial replicas. In 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005 (pp. 726-733). Article 1558635 (2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005; Vol. 2). https://doi.org/10.1109/CCGRID.2005.1558635

Servicing range queries on multidimensional datasets with partial replicas. / Weng, Li; Catalyurek, Umit; Kurc, Tahsin et al.
2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005. 2005. p. 726-733 1558635 (2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Weng, L, Catalyurek, U, Kurc, T, Agrawal, G & Saltz, J 2005, Servicing range queries on multidimensional datasets with partial replicas. in 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005., 1558635, 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005, vol. 2, pp. 726-733, 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005, Cardiff, Wales, United Kingdom, 5/9/05. https://doi.org/10.1109/CCGRID.2005.1558635

@inproceedings{6227edeb96714ae8a465444d532475f1,

title = "Servicing range queries on multidimensional datasets with partial replicas",

abstract = "Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.",

author = "Li Weng and Umit Catalyurek and Tahsin Kurc and Gagan Agrawal and Joel Saltz",

year = "2005",

doi = "10.1109/CCGRID.2005.1558635",

language = "English (US)",

isbn = "0780390741",

series = "2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005",

pages = "726--733",

booktitle = "2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005",

note = "2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005 ; Conference date: 09-05-2005 Through 12-05-2005",

}

TY - GEN

T1 - Servicing range queries on multidimensional datasets with partial replicas

AU - Weng, Li

AU - Catalyurek, Umit

AU - Kurc, Tahsin

AU - Agrawal, Gagan

AU - Saltz, Joel

PY - 2005

Y1 - 2005

N2 - Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.

AB - Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.

UR - http://www.scopus.com/inward/record.url?scp=33845346800&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33845346800&partnerID=8YFLogxK

U2 - 10.1109/CCGRID.2005.1558635

DO - 10.1109/CCGRID.2005.1558635

M3 - Conference contribution

AN - SCOPUS:33845346800

SN - 0780390741

SN - 9780780390744

T3 - 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005

SP - 726

EP - 733

BT - 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005

T2 - 2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005

Y2 - 9 May 2005 through 12 May 2005

ER -

Servicing range queries on multidimensional datasets with partial replicas

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this