Interpreting tables in text using probabilistic two-dimensional context-free grammars

Abstract

Table interpretation is the process of extracting meaningful information from tex-tual tables. More precisely, interpreting tables entails producing a semantic analysis in the form of a logical representation that is suitable for the kinds of inferences one would need to perform upon the information contained in the table. To a large ex-tent, the quality of a table interpretation model depends on how accurately the model disambiguates the types of the table and its subparts. This in turn depends on three major factors that we address in this thesis: (1) the expressiveness and suitability of the logical representation for semantic interpretation, (2) the adequacy of the inventory of table types and subparts, and (3) the power of the disambiguation algorithms. We present a new elegant and extensible table analysis model that is capable of interpret-ing an unusually wide range of textual tables in documents. Unlike the few existing table analysis models, which largely rely on relatively ad hoc heuristics that produce semantically vague interpretation of tables, our linguistically oriented approach is sys-tematic and grammar based, which allows our analysis model to be concise and yet recognize a wider range of types of tables than others. Specifically, our table anal-ysis model introduces the use of Viterbi parsing under probabilistic two-dimensional CFGs. This cleaner grammatical approach facilitates not only greater coverage, but also grammar extension and maintenance, as well as a more direct and declarative link to semantic interpretation, for which we also introduce a cleaner underlying semantic model that exploits database theory to address the semantic vagueness problem of ex-isting table analysis models. In semantic interpretation experiments, our model was able to classified unseen web tables from different domains into semantic types more comprehensive than provided by previous works with 65% precision and 77% recall

    Similar works