62 research outputs found
CCG-augmented hierarchical phrase-based statistical machine translation
Augmenting Statistical Machine Translation (SMT) systems with syntactic information aims at improving translation quality. Hierarchical Phrase-Based (HPB) SMT takes a step toward incorporating syntax in Phrase-Based (PB) SMT by modelling one aspect of language syntax, namely the hierarchical structure of phrases. Syntax Augmented Machine Translation (SAMT) further incorporates syntactic information extracted using context free phrase structure grammar (CF-PSG) in the HPB SMT model. One of the main challenges facing CF-PSG-based augmentation approaches for SMT systems emerges from the difference in the definition of the constituent in CF-PSG and the ‘phrase’ in SMT systems, which hinders the ability of CF-PSG to express the syntactic function of many SMT phrases. Although the SAMT approach to solving this problem using ‘CCG-like’ operators to combine constituent labels improves syntactic constraint coverage, it significantly increases their sparsity, which restricts translation and negatively affects its quality.
In this thesis, we address the problems of sparsity and limited coverage of syntactic constraints facing the CF-PSG-based syntax augmentation approaches for HPB SMT using Combinatory Cateogiral Grammar (CCG). We demonstrate that
CCG’s flexible structures and rich syntactic descriptors help to extract richer, more expressive and less sparse syntactic constraints with better coverage than CF-PSG,
which enables our CCG-augmented HPB system to outperform the SAMT system. We also try to soften the syntactic constraints imposed by CCG category nonterminal labels by extracting less fine-grained CCG-based labels. We demonstrate that CCG label simplification helps to significantly improve the performance of our CCG category HPB system. Finally, we identify the factors which limit the coverage of the syntactic constraints in our CCG-augmented HPB model. We then try to tackle these factors by extending the definition of the nonterminal label to be composed of a sequence of CCG categories and augmenting the glue grammar with CCG combinatory rules. We demonstrate that our extension approaches help to significantly increase the scope of the syntactic constraints applied in our CCG-augmented HPB model and achieve significant improvements over the HPB SMT baseline
Extraction and Coordination in Phrase Structure Grammar and Categorial Grammar
A large proportion of computationally-oriented theories of grammar operate within the confines
of monostratality (i.e. there is only one level of syntactic analysis),
compositionality (i.e. the meaning of an expression is determined by the meanings of its
syntactic parts, plus their manner of combination), and adjacency (i.e. the only operation on
terminal strings is concatenation). This thesis looks at two major approaches falling within
these bounds: that based on phrase structure grammar (e.g. Gazdar), and that based on
categorial grammar (e.g. Steedman).
The theories are examined with reference to extraction and coordination constructions;
crucially a range of 'compound' extraction and coordination phenomena are brought to
bear. It is argued that the early phrase structure grammar metarules can characterise operations
generating compound phenomena, but in so doing require a categorial-like category
system. It is also argued that while categorial grammar contains an adequate category
apparatus, Steedman's primitives such as composition do not extend to cover the full range
of data. A theory is therefore presented integrating the approaches of Gazdar and Steedman.
The central issue as regards processing is derivational equivalence: the grammars under
consideration typically generate many semantically equivalent derivations of an expression.
This problem is addressed by showing how to axiomatise derivational equivalence, and a
parser is presented which employs the axiomatisation to avoid following equivalent paths
Using pattern-action rules for the generation of GPSG structures from separate semantic representations
In many tactical NL generators the semantic input structure is taken for granted. In this paper, a new approach to multilingual, tactical generation is presented that keeps the syntax separate from the semantics. This allows for the system to be directly adapted to application-dependent representations. In the case at hand, the semantics is specifically designed for sentence-semantic transfer in a machine translation system. The syntax formalism used is Generalized Phrase Structure Grammar (GPSG). The mapping from semantic onto syntactic structures is performed by a set of pattern-action rules. Each rule matches a piece of the input structure and guides the GPSG structure-building process by telling it which syntax rule(s) to apply. The scope of each pattern-action rule is strictly local, the actions are primitive, and rules can not call each other. These restrictions render the production rule approach both highly modular and transparent
Generierung natürlicher Sprache
Dieser Aufsatz beschreibt das interdisziplinäre Forschungsgebiet Generierung natürlicher Sprache und gibt einen Überblick über den gegenwärtigen Stand der Kunst. Behandelt werden Ansätze aus der Psycholinguistik, Planungs- und Entscheidungsverfahren aus der sprachverarbeitenden KI und Verfahren auf der Grundlage moderner Grammatikformalismen. Die jeweiligen Forschungsziele und -methoden werden dargestellt.This report describes the interdisciplinary research field of natural language generation and gives an overview of the current state of the art. The paper presents psycholinguistic approaches, AI planning and decision-making processes, and generators based on modern grammar formalisms. For each case, the research goals and methods are described
Generalisierte Phasenstruktur-Grammatiken und ihre Verwendung zur maschinellen Sprachverarbeitung
Der vorliegende Artikel setzt sich mit der Syntaxtheorie der Generalisierten Phrasenstruktur-Grammatiken (GPSG) auseinander, gibt eine neue formale Definition des aktuellen Formalismus aus und zeigt die mit diesem Formalismus verbundenen Probleme auf. Darüber hinaus wird begründet, warum der Formalismus nicht effizient implementierbar ist. Es wird eine konstruktive Version von GPSG vorgeschlagen, die für die maschinelle Sprachverarbeitung (Parsing und Generierung) geeignet ist. Der Artikel kann gleichzeitig als eine Grundlage für Lehrveranstaltungen über GPSG dienen.This article describes the syntax theory of Generalized Phrase Structure Grammar (GPSG), introduces a new formal definition and reveals the problems connected with this formalism. Moreover it is shown why the formalism cannot be implemented. A constructive version of GPSG is suggested that is suitable for parsing and generation. This report may also serve as a basis for lectures about GPSG
Generalisierte Phasenstruktur-Grammatiken und ihre Verwendung zur maschinellen Sprachverarbeitung
Der vorliegende Artikel setzt sich mit der Syntaxtheorie der Generalisierten Phrasenstruktur-Grammatiken (GPSG) auseinander, gibt eine neue formale Definition des aktuellen Formalismus aus und zeigt die mit diesem Formalismus verbundenen Probleme auf. Darüber hinaus wird begründet, warum der Formalismus nicht effizient implementierbar ist. Es wird eine konstruktive Version von GPSG vorgeschlagen, die für die maschinelle Sprachverarbeitung (Parsing und Generierung) geeignet ist. Der Artikel kann gleichzeitig als eine Grundlage für Lehrveranstaltungen über GPSG dienen.This article describes the syntax theory of Generalized Phrase Structure Grammar (GPSG), introduces a new formal definition and reveals the problems connected with this formalism. Moreover it is shown why the formalism cannot be implemented. A constructive version of GPSG is suggested that is suitable for parsing and generation. This report may also serve as a basis for lectures about GPSG
Classification-based phrase structure grammar: an extended revised version of HPSG
This thesis is concerned with a presentation of Classification -based Phrase Structure
Grammar (or cPSG), a grammatical theory that has grown out of extensive revisions
of, and extensions to, HPSG. The fundamental difference between this theory and HPSG
concerns the central role that classification plays in the grammar: the grammar classifies
strings, according to their feature structure descriptions, as being of various types.
Apart from the role of classification, the theory bears a close resemblance to HPSG,
though it is by no means a direct translation, including numerous revisions and extensions.
A central goal in the development of the theory has been its computational
implementation, which is included in the thesis.The presentation may be divided into four parts. In the first, chapters 1 and 2, we
present the grammatical formalism within which the theory is stated. This consists of a
development of the notion of a classificatory system (chapter 1), and the incorporation
of hierarchality into that notion (chapter 2).The second part concerns syntactic issues. Chapter 3 revises the HPSG treatment of
specifiers, complements and adjuncts, incorporating ideas that specifiers and complements
should be distinguished and presenting a treatment of adjuncts whereby the
head is selected for by the adjunct. Chapter 4 presents several options for an account of
unbounded dependencies. The accounts are based loosely on that of GPSG, and a reconstruction
of GPSG's Foot Feature Principle is presented which does not involve a notion
of default. Chapter 5 discusses coordination, employing an extension of Rounds- Kasper
logic to allow a treatment of cross -categorial coordination.In the third part, chapters 6, 7 and 8, we turn to semantic issues. We begin (Chapter 6)
with a discussion of Situation Theory, the background semantic theory, attempting to
establish a precise and coherent version of the theory within which to work. Chapter 7
presents the bulk of the treatment of semantics, and can be seen as an extensive revision
of the HPSG treatment of semantics. The aim is to provide a semantic treatment which
is faithful to the version of Situation Theory presented in Chapter 6. Chapter 8 deals
with quantification, discussing the nature of quantification in Situation Theory before
presenting a treatment of quantification in CPSG. Some residual questions about the
semantics of coordinated noun phrases are also addressed in this chapter.The final part, Chapter 9, concerns the actual computational implementation of the
theory. A parsing algorithm based on hierarchical classification is presented, along with
four strategies that might be adopted given that algorithm. Also discussed are some
implementation details. A concluding chapter summarises the arguments of the thesis
and outlines some avenues for future research
- …