TY - GEN
T1 - Instance discovery and schema matching with applications to biological deep web data integration
AU - Liu, Tantan
AU - Wang, Fan
AU - Agrawal, Gagan
PY - 2010
Y1 - 2010
N2 - We presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information provided by the query interfaces themselves, as well as from the obtained output pages of the related data sources, by query probing using dynamically identified input instances. Then, using a hierarchical representation of schemas and by applying clustering techniques, we are able to generate schema matches. We show the effectiveness of our technique while integrating 24 query interfaces.
AB - We presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information provided by the query interfaces themselves, as well as from the obtained output pages of the related data sources, by query probing using dynamically identified input instances. Then, using a hierarchical representation of schemas and by applying clustering techniques, we are able to generate schema matches. We show the effectiveness of our technique while integrating 24 query interfaces.
UR - http://www.scopus.com/inward/record.url?scp=77956141523&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77956141523&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15120-0_12
DO - 10.1007/978-3-642-15120-0_12
M3 - Conference contribution
SN - 9780769540832
VL - 6254
T3 - 10th IEEE International Conference on Bioinformatics and Bioengineering 2010, BIBE 2010
SP - 304
EP - 305
BT - 10th IEEE International Conference on Bioinformatics and Bioengineering 2010, BIBE 2010
T2 - 10th IEEE International Conference on Bioinformatics and Bioengineering, BIBE-2010
Y2 - 31 May 2010 through 3 June 2010
ER -