Efficient decision tree construction on streaming data

Ruoming Jin, Gagan Agrawal

Research output: Contribution to conferencePaperpeer-review

126 Scopus citations

Abstract

Decision tree construction is a well studied problem in data mining. Recently, there has been much interest in mining streaming data. Domingos and Hulten have presented a one-pass algorithm for decision tree construction. Their work uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed.In this paper, we revisit this problem. We make the following two contributions: 1) We present a numerical interval pruning (NIP) approach for efficiently processing numerical attributes. Our results show an average of 39% reduction in execution times. 2) We exploit the properties of the gain function entropy (and gini) to reduce the sample size required for obtaining a given bound on the accuracy. Our experimental results show a 37% reduction in the number of data instances required.

Original languageEnglish (US)
Pages571-576
Number of pages6
DOIs
StatePublished - 2003
Externally publishedYes
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: Aug 24 2003Aug 27 2003

Conference

Conference9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
Country/TerritoryUnited States
CityWashington, DC
Period8/24/038/27/03

Keywords

  • Decision tree
  • Sampling
  • Streaming data

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Efficient decision tree construction on streaming data'. Together they form a unique fingerprint.

Cite this