TY - GEN

T1 - Sketching asynchronous streams over a sliding window

AU - Tirthapura, Srikanta

AU - Xu, Bojian

AU - Busch, Costas

N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2006

Y1 - 2006

N2 - We study the problem of maintaining sketches of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent elements of a stream is modeled by the sliding timestamp window, which is the set of elements with timestamps that are close to the current time. We design algorithms for maintaining sketches of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates, the sum and the median, of a stream of numbers. The space taken by the sketches, the time needed for querying the sketch, and the time for inserting new elements into the sketch are all polylog with respect to the maximum window size and the values of the data items in the window. Our sketches can be easily combined in a loss-less and compact way, making them useful for distributed computations over data streams. Previous works on sketching recent elements of a data stream have all considered the more restrictive scenario of synchronous streams, where the observed order of data is the same as the time order in which the data was generated. Our notion of recency of elements is more general than that studied in previous work, and thus our sketches are more robust to network delays and asynchrony.

AB - We study the problem of maintaining sketches of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent elements of a stream is modeled by the sliding timestamp window, which is the set of elements with timestamps that are close to the current time. We design algorithms for maintaining sketches of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates, the sum and the median, of a stream of numbers. The space taken by the sketches, the time needed for querying the sketch, and the time for inserting new elements into the sketch are all polylog with respect to the maximum window size and the values of the data items in the window. Our sketches can be easily combined in a loss-less and compact way, making them useful for distributed computations over data streams. Previous works on sketching recent elements of a data stream have all considered the more restrictive scenario of synchronous streams, where the observed order of data is the same as the time order in which the data was generated. Our notion of recency of elements is more general than that studied in previous work, and thus our sketches are more robust to network delays and asynchrony.

KW - Aggregates

KW - Asynchronous streams

KW - Data stream processing

KW - Distributed streams

KW - Sketches of streams

KW - Sliding windows

KW - Union of streams

UR - http://www.scopus.com/inward/record.url?scp=33748712629&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748712629&partnerID=8YFLogxK

U2 - 10.1145/1146381.1146397

DO - 10.1145/1146381.1146397

M3 - Conference contribution

AN - SCOPUS:33748712629

SN - 1595933840

SN - 9781595933843

T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

SP - 82

EP - 91

BT - Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing 2006

PB - Association for Computing Machinery (ACM)

T2 - 25th Annual ACM Symposium on Principles of Distributed Computing 2006

Y2 - 23 July 2006 through 26 July 2006

ER -