30,668 research outputs found
A Data-Oriented Approach to Semantic Interpretation
In Data-Oriented Parsing (DOP), an annotated language corpus is used as a
stochastic grammar. The most probable analysis of a new input sentence is
constructed by combining sub-analyses from the corpus in the most probable way.
This approach has been succesfully used for syntactic analysis, using corpora
with syntactic annotations such as the Penn Treebank. If a corpus with
semantically annotated sentences is used, the same approach can also generate
the most probable semantic interpretation of an input sentence. The present
paper explains this semantic interpretation method, and summarizes the results
of a preliminary experiment. Semantic annotations were added to the syntactic
annotations of most of the sentences of the ATIS corpus. A data-oriented
semantic interpretation algorithm was succesfully tested on this semantically
enriched corpus.Comment: 10 pages, Postscript; to appear in Proceedings Workshop on
Corpus-Oriented Semantic Analysis, ECAI-96, Budapes
LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible
The Linguistic Annotation Framework (LAF) provides a general, extensible
stand-off markup system for corpora. This paper discusses LAF-Fabric, a new
tool to analyse LAF resources in general with an extension to process the
Hebrew Bible in particular. We first walk through the history of the Hebrew
Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric
may serve as an analysis tool for this corpus. Finally, we describe three
analytic projects/workflows that benefit from the new LAF representation:
1) the study of linguistic variation: extract cooccurrence data of common
nouns between the books of the Bible (Martijn Naaijer); 2) the study of the
grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman);
3) construction of a parser of classical Hebrew by Data Oriented Parsing:
generate tree structures from the database (Andreas van Cranenburgh)
Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System
The NWO Priority Programme Language and Speech Technology is a 5-year
research programme aiming at the development of spoken language information
systems. In the Programme, two alternative natural language processing (NLP)
modules are developed in parallel: a grammar-based (conventional, rule-based)
module and a data-oriented (memory-based, stochastic, DOP) module. In order to
compare the NLP modules, a formal evaluation has been carried out three years
after the start of the Programme. This paper describes the evaluation procedure
and the evaluation results. The grammar-based component performs much better
than the data-oriented one in this comparison.Comment: Proceedings of CLIN 9
Data-oriented parsing and the Penn Chinese treebank
We present an investigation into parsing the Penn Chinese Treebank using a Data-Oriented Parsing (DOP) approach. DOP
comprises an experience-based approach to natural language parsing. Most published research in the DOP framework uses PStrees as its representation schema. Drawbacks of the DOP approach centre around issues of efficiency. We incorporate recent advances in DOP parsing techniques into a novel DOP parser which generates a compact representation of all subtrees which can be derived from any full parse tree.
We compare our work to previous work on parsing the Penn Chinese Treebank, and provide both a quantitative and qualitative evaluation. While our results in terms of Precision and Recall are slightly below those published in related research, our approach requires no manual encoding of head rules, nor is a development phase per se necessary.
We also note that certain constructions which were problematic in this previous work can be handled correctly by our DOP parser. Finally, we observe that the âDOP Hypothesisâ is confirmed for parsing the Penn Chinese Treebank
- âŚ