Compiler Support for Efficient Processing of XML Datasets

Xiaogang Li, Renato Ferreira, Gagan Agrawal

Research output: Contribution to conferencePaperpeer-review

4 Scopus citations


Declarative, high-level, and/or application-class specific languages are often successful in easing application development. In this paper, we report our experiences in compiling a recently developed XML Query Language, XQuery for applications that process scientific datasets. Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient execution involves a number of challenges. These are, 1) analysis of recursive functions to identify reduction computations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) parallelization of generalized reduction functions, which particularly requires the synthesis of global reduction functions, 4) application of data-centric transformations on the structure of XQuery, and 5) translation of XQuery processing to an imperative language like C/C++, which is required for using a middleware that offers low-level functionality. This paper describes our solutions towards these problems. By implementing the techniques in a compiler and generating code for a runtime system called Active Data Repository (ADR), we are able to achieve efficient processing of disk-resident datasets and parallelization on a cluster of machines. Our experimental results show that: 1) restructuring transformations, i.e. removing recursion and applying data-centric execution, result in several-folds improvement in performance, and 2) parallel versions achieve good load-balance, and incur no significant overheads besides communication.

Original languageEnglish (US)
Number of pages11
StatePublished - 2003
Externally publishedYes
Event2003 International Conference on Supercomputing - San Francisco, CA, United States
Duration: Jun 23 2003Jun 26 2003


Conference2003 International Conference on Supercomputing
Country/TerritoryUnited States
CitySan Francisco, CA


  • Data Intensive Computing
  • Restricting Compilers
  • XML
  • XQuery

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Compiler Support for Efficient Processing of XML Datasets'. Together they form a unique fingerprint.

Cite this