1,558 research outputs found
Partial dependency parsing for Irish
In this paper we present a partial dependency parser for Irish, in which Constraint Grammar (CG) rules are used to annotate dependency relations and grammatical functions in unrestricted Irish text. Chunking is performed using a regular-expression grammar which operates on the dependency tagged sentences. As this is the first implementation of a parser for unrestricted Irish text (to our knowledge), there were no guidelines or precedents available. Therefore deciding what constitutes a syntactic unit, and how it should be annotated, accounts for a major part of the early development effort. Currently, all tokens in a sentence are tagged for grammatical function and local dependency. Long-distance dependencies, prepositional attachments or coordination are not handled, resulting in a partial dependency analysis. Evaluations show that the partial dependency analysis achieves an f-score of 93.60% on development data and 94.28% on unseen test data, while the chunker achieves an f-score of 97.20% on development data and 93.50% on unseen test data
Handling non-compositionality in multilingual CNLs
In this paper, we describe methods for handling multilingual
non-compositional constructions in the framework of GF. We specifically look at
methods to detect and extract non-compositional phrases from parallel texts and
propose methods to handle such constructions in GF grammars. We expect that the
methods to handle non-compositional constructions will enrich CNLs by providing
more flexibility in the design of controlled languages. We look at two specific
use cases of non-compositional constructions: a general-purpose method to
detect and extract multilingual multiword expressions and a procedure to
identify nominal compounds in German. We evaluate our procedure for multiword
expressions by performing a qualitative analysis of the results. For the
experiments on nominal compounds, we incorporate the detected compounds in a
full SMT pipeline and evaluate the impact of our method in machine translation
process.Comment: CNL workshop in COLING 201
Seeing the wood for the trees: data-oriented translation
Data-Oriented Translation (DOT), which is based on Data-Oriented Parsing (DOP), comprises an experience-based approach to translation, where new translations are derived with reference to grammatical analyses of previous translations. Previous DOT experiments [Poutsma, 1998, Poutsma, 2000a, Poutsma, 2000b] were small in scale because important advances in DOP technology were not incorporated
into the translation model. Despite this, related work [Way, 1999, Way, 2003a, Way, 2003b] reports that DOT models are viable in that solutions to âhardâ translation cases are readily available. However, it has not been shown to date that DOT models scale to larger datasets. In this work, we describe a novel DOT system, inspired by recent advances in DOP parsing technology. We test our system on larger, more complex corpora than have been used heretofore, and present both automatic and human evaluations which show that high quality translations can be achieved at reasonable speeds
Evaluating syntax-driven approaches to phrase extraction for MT
In this paper, we examine a number of different phrase segmentation approaches for Machine Translation and how they perform when used to supplement the translation model of a phrase-based SMT system. This work represents a summary of a number of years of research carried out at Dublin City University in which it has been found that improvements can be made using hybrid translation
models. However, the level of improvement achieved is dependent on the amount of training data used. We describe the various approaches to phrase segmentation and combination explored, and outline a series of experiments investigating the relative merits of each method
Automatic Extraction of Subcategorization from Corpora
We describe a novel technique and implemented system for constructing a
subcategorization dictionary from textual corpora. Each dictionary entry
encodes the relative frequency of occurrence of a comprehensive set of
subcategorization classes for English. An initial experiment, on a sample of 14
verbs which exhibit multiple complementation patterns, demonstrates that the
technique achieves accuracy comparable to previous approaches, which are all
limited to a highly restricted set of subcategorization classes. We also
demonstrate that a subcategorization dictionary built with the system improves
the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9
Lexicalization and Grammar Development
In this paper we present a fully lexicalized grammar formalism as a
particularly attractive framework for the specification of natural language
grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining
Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We
illustrate the advantages of lexicalized grammars in various contexts of
natural language processing, ranging from wide-coverage grammar development to
parsing and machine translation. We also present a method for compact and
efficient representation of lexicalized trees.Comment: ps file. English w/ German abstract. 10 page
- âŚ