Abstract
Decision tree construction is a well studied problem in data mining. Recently, there has been much interest in mining streaming data. Domingos and Hulten have presented a one-pass algorithm for decision tree construction. Their work uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed.In this paper, we revisit this problem. We make the following two contributions: 1) We present a numerical interval pruning (NIP) approach for efficiently processing numerical attributes. Our results show an average of 39% reduction in execution times. 2) We exploit the properties of the gain function entropy (and gini) to reduce the sample size required for obtaining a given bound on the accuracy. Our experimental results show a 37% reduction in the number of data instances required.
Original language | English (US) |
---|---|
Pages | 571-576 |
Number of pages | 6 |
DOIs | |
State | Published - 2003 |
Externally published | Yes |
Event | 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States Duration: Aug 24 2003 → Aug 27 2003 |
Conference
Conference | 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 |
---|---|
Country/Territory | United States |
City | Washington, DC |
Period | 8/24/03 → 8/27/03 |
Keywords
- Decision tree
- Sampling
- Streaming data
ASJC Scopus subject areas
- Software
- Information Systems