2,084 research outputs found
Treebank-based acquisition of LFG parsing resources for French
Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks
and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%
A study of systems implementation languages for the POCCNET system
The results are presented of a study of systems implementation languages for the Payload Operations Control Center Network (POCCNET). Criteria are developed for evaluating the languages, and fifteen existing languages are evaluated on the basis of these criteria
DCU 250 Arabic dependency bank: an LFG gold standard resource for the Arabic Penn treebank
This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is necessary to evaluate against established gold standard resources. Gold standards for various languages have been developed, but to our knowledge, such a resource has not yet been constructed for Arabic. The construction of the DCU 250 marks the first step
towards the creation of an automatic LFG f-structure annotation algorithm for the ATB,
and for the extraction of Arabic grammatical and lexical resources
Amalia -- A Unified Platform for Parsing and Generation
Contemporary linguistic theories (in particular, HPSG) are declarative in
nature: they specify constraints on permissible structures, not how such
structures are to be computed. Grammars designed under such theories are,
therefore, suitable for both parsing and generation. However, practical
implementations of such theories don't usually support bidirectional processing
of grammars. We present a grammar development system that includes a compiler
of grammars (for parsing and generation) to abstract machine instructions, and
an interpreter for the abstract machine language. The generation compiler
inverts input grammars (designed for parsing) to a form more suitable for
generation. The compiled grammars are then executed by the interpreter using
one control strategy, regardless of whether the grammar is the original or the
inverted version. We thus obtain a unified, efficient platform for developing
reversible grammars.Comment: 8 pages postscrip
Automatic acquisition of LFG resources for German - as good as it gets
We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined
by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG
resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar
Automatic treebank-based acquisition of Arabic LFG dependency structures
A number of papers have reported on methods for the automatic acquisition of large-scale, probabilistic LFG-based grammatical resources from treebanks for English (Cahill and al., 2002), (Cahill and al., 2004), German (Cahill and al., 2003), Chinese (Burke, 2004), (Guo and al.,
2007), Spanish (O’Donovan, 2004), (Chrupala and van Genabith, 2006) and French (Schluter and van Genabith, 2008). Here, we extend the LFG grammar acquisition approach to Arabic and the Penn Arabic Treebank (ATB) (Maamouri and
Bies, 2004), adapting and extending the methodology
of (Cahill and al., 2004) originally developed for English. Arabic is challenging because of its morphological richness and syntactic complexity.
Currently 98% of ATB trees (without FRAG and X) produce a covering and connected f-structure.
We conduct a qualitative evaluation of our annotation
against a gold standard and achieve an f-score of 95%
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
A growing interest in tasks involving language understanding by the NLP
community has led to the need for effective semantic parsing and inference.
Modern NLP systems use semantic representations that do not quite fulfill the
nuanced needs for language understanding: adequately modeling language
semantics, enabling general inferences, and being accurately recoverable. This
document describes underspecified logical forms (ULF) for Episodic Logic (EL),
which is an initial form for a semantic representation that balances these
needs. ULFs fully resolve the semantic type structure while leaving issues such
as quantifier scope, word sense, and anaphora unresolved; they provide a
starting point for further resolution into EL, and enable certain structural
inferences without further resolution. This document also presents preliminary
results of creating a hand-annotated corpus of ULFs for the purpose of training
a precise ULF parser, showing a three-person pairwise interannotator agreement
of 0.88 on confident annotations. We hypothesize that a divide-and-conquer
approach to semantic parsing starting with derivation of ULFs will lead to
semantic analyses that do justice to subtle aspects of linguistic meaning, and
will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on
Computational Semantics (IWCS 2019
- …