Active learning based frequent itemset mining over the deep web

Tantan Liu, Gagan Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

In recent years, one mode of data dissemination has become extremely popular, which is the deep web. A key characteristics of deep web data sources is that data can only be accessed through the limited query interface they support. This paper develops a methodology for mining the deep web. Because these data sources cannot be accessed directly, thus, data mining must be performed based on sampling of the datasets. The samples, in turn, can only be obtained by querying the deep web databases with specific inputs. Unlike existing sampling based methods, which are typically applied on relational databases or streaming data, sampling costs, and not the computation or memory costs, are the dominant consideration in designing the algorithm.

Original languageEnglish (US)
Title of host publication2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Pages219-230
Number of pages12
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, Germany
Duration: Apr 11 2011Apr 16 2011

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
CountryGermany
CityHannover
Period4/11/114/16/11

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'Active learning based frequent itemset mining over the deep web'. Together they form a unique fingerprint.

Cite this