TY - GEN

T1 - A deterministic algorithm for summarizing asynchronous streams over a sliding window

AU - Busch, Costas

AU - Tirthapura, Srikanta

PY - 2007

Y1 - 2007

N2 - We consider the problem of maintaining aggregates over recent elements of a massive data stream. Motivated by applications involving network data, we consider asynchronous data streams, where the observed order of data may be different from the order in which the data was generated. The set of recent elements is modeled as a sliding timestamp window of the stream, whose elements are changing continuously with time. We present the first deterministic algorithms for maintaining a small space summary of elements in a sliding timestamp window of an asynchronous data stream. The summary can return approximate answers for the following fundamental aggregates: basic count, the number of elements within the sliding window, and sum, the sum of all element values within the sliding window. For basic counting, the space taken by our summary is O(log W · log B · (log W + log B)/ε) bits, where B is an upper bound on the value of the basic count, W is an upper bound on the width of the timestamp window, and e is the desired relative error. Our algorithms are based on a novel data structure called splittable histogram. Prior to this work, randomized algorithms were known for this problem, which provide weaker guarantees than those provided by our deterministic algorithms.

AB - We consider the problem of maintaining aggregates over recent elements of a massive data stream. Motivated by applications involving network data, we consider asynchronous data streams, where the observed order of data may be different from the order in which the data was generated. The set of recent elements is modeled as a sliding timestamp window of the stream, whose elements are changing continuously with time. We present the first deterministic algorithms for maintaining a small space summary of elements in a sliding timestamp window of an asynchronous data stream. The summary can return approximate answers for the following fundamental aggregates: basic count, the number of elements within the sliding window, and sum, the sum of all element values within the sliding window. For basic counting, the space taken by our summary is O(log W · log B · (log W + log B)/ε) bits, where B is an upper bound on the value of the basic count, W is an upper bound on the width of the timestamp window, and e is the desired relative error. Our algorithms are based on a novel data structure called splittable histogram. Prior to this work, randomized algorithms were known for this problem, which provide weaker guarantees than those provided by our deterministic algorithms.

UR - http://www.scopus.com/inward/record.url?scp=38049169339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049169339&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-70918-3_40

DO - 10.1007/978-3-540-70918-3_40

M3 - Conference contribution

AN - SCOPUS:38049169339

SN - 9783540709176

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 465

EP - 476

BT - STACS 2007 - 24th Annual Symposium on Theoretical Aspects of Computer Science, Proceedings

PB - Springer Verlag

T2 - 24th Annual Symposium on Theoretical Aspects of Computer Science, STACS 2007

Y2 - 22 February 2007 through 24 February 2007

ER -