TY - GEN
T1 - Differential analysis on deep web data sources
AU - Liu, Tantan
AU - Wang, Fan
AU - Zhu, Jiedan
AU - Agrawal, Gagan
PY - 2010
Y1 - 2010
N2 - The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources.We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.
AB - The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources.We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.
UR - http://www.scopus.com/inward/record.url?scp=79951760466&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951760466&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2010.22
DO - 10.1109/ICDMW.2010.22
M3 - Conference contribution
AN - SCOPUS:79951760466
SN - 9780769542577
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 33
EP - 40
BT - Proceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
T2 - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Y2 - 14 December 2010 through 17 December 2010
ER -