TY - GEN
T1 - Graph and topological structure mining on scientific articles
AU - Wang, Fan
AU - Jin, Ruoming
AU - Agrawal, Gagan
AU - Piontkivska, Helen
PY - 2007
Y1 - 2007
N2 - In this paper, we investigate a new approach for literature mining. We use frequent subgraph mining, and its generalization topological structure mining, for finding interesting relationships between gene names and other key biological terms from the text of scientific articles. We show how we can find keywords of interest and represent them as nodes of the graphs. We also propose several methods for inserting edges between these nodes. Our study initially focused on comparing: 1) different methods for constructing edges, and 2) patterns found from sub-graph mining and topological structure mining. Subsequently, we analyzed several frequent topological minors reported by our experiments, and explained their scientific significance. Overall, our study shows the following. First, a simple method of constructing edges, which is based on sliding windows, seems to provide the best results. Second, we are able to find much larger number of well-known and meaningful topological patterns with high support values, as compared to sub-graphs. Overall, the frequent topological minors our algorithm found correspond well to known relationships between genes and biological terms. Thus, we believe that topological structure mining can be a very valuable tool for researchers who are not deeply familiar with the existing literature, and want to obtain a quick summary about known relationships among key scientific names or terms.
AB - In this paper, we investigate a new approach for literature mining. We use frequent subgraph mining, and its generalization topological structure mining, for finding interesting relationships between gene names and other key biological terms from the text of scientific articles. We show how we can find keywords of interest and represent them as nodes of the graphs. We also propose several methods for inserting edges between these nodes. Our study initially focused on comparing: 1) different methods for constructing edges, and 2) patterns found from sub-graph mining and topological structure mining. Subsequently, we analyzed several frequent topological minors reported by our experiments, and explained their scientific significance. Overall, our study shows the following. First, a simple method of constructing edges, which is based on sliding windows, seems to provide the best results. Second, we are able to find much larger number of well-known and meaningful topological patterns with high support values, as compared to sub-graphs. Overall, the frequent topological minors our algorithm found correspond well to known relationships between genes and biological terms. Thus, we believe that topological structure mining can be a very valuable tool for researchers who are not deeply familiar with the existing literature, and want to obtain a quick summary about known relationships among key scientific names or terms.
UR - http://www.scopus.com/inward/record.url?scp=47649127668&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47649127668&partnerID=8YFLogxK
U2 - 10.1109/BIBE.2007.4375739
DO - 10.1109/BIBE.2007.4375739
M3 - Conference contribution
AN - SCOPUS:47649127668
SN - 1424415098
SN - 9781424415090
T3 - Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
SP - 1318
EP - 1322
BT - Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
T2 - 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
Y2 - 14 January 2007 through 17 January 2007
ER -