Location of Repository

HTML Table Interpretation by Sibling Page Comparison in the Molecular Biology Domain

By Cui Tao and David W. Embley

Abstract

There are large and growing amount of biological data that reside in different online repositories. Many of these repositories represent their data in tables. In order to automatically understand these online pages, a system that can interpret tables is desired. However, the longstanding problem of automatic table interpretation still illudes us [12]. We offer a solution for the common special case in which so-called sibling pages are available. Sibling pages, which are the pages commonly generated by underlying web databases, are compared to identify and connect nonvarying components (category labels) and varying components (data values). We tested our solution on 862 HTML tables. Experimental results show that the system can successfully identify sibling tables, generate structure patterns, interpret different tables using the generated patterns, and automatically adjust the structure patterns as needed

Topics: Bioinformatics, table interpretation
Year: 2014
OAI identifier: oai:CiteSeerX.psu:10.1.1.417.8085
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://dithers.cs.byu.edu/tang... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.