62 research outputs found

    CCG-augmented hierarchical phrase-based statistical machine translation

    Get PDF
    Augmenting Statistical Machine Translation (SMT) systems with syntactic information aims at improving translation quality. Hierarchical Phrase-Based (HPB) SMT takes a step toward incorporating syntax in Phrase-Based (PB) SMT by modelling one aspect of language syntax, namely the hierarchical structure of phrases. Syntax Augmented Machine Translation (SAMT) further incorporates syntactic information extracted using context free phrase structure grammar (CF-PSG) in the HPB SMT model. One of the main challenges facing CF-PSG-based augmentation approaches for SMT systems emerges from the difference in the definition of the constituent in CF-PSG and the ‘phrase’ in SMT systems, which hinders the ability of CF-PSG to express the syntactic function of many SMT phrases. Although the SAMT approach to solving this problem using ‘CCG-like’ operators to combine constituent labels improves syntactic constraint coverage, it significantly increases their sparsity, which restricts translation and negatively affects its quality. In this thesis, we address the problems of sparsity and limited coverage of syntactic constraints facing the CF-PSG-based syntax augmentation approaches for HPB SMT using Combinatory Cateogiral Grammar (CCG). We demonstrate that CCG’s flexible structures and rich syntactic descriptors help to extract richer, more expressive and less sparse syntactic constraints with better coverage than CF-PSG, which enables our CCG-augmented HPB system to outperform the SAMT system. We also try to soften the syntactic constraints imposed by CCG category nonterminal labels by extracting less fine-grained CCG-based labels. We demonstrate that CCG label simplification helps to significantly improve the performance of our CCG category HPB system. Finally, we identify the factors which limit the coverage of the syntactic constraints in our CCG-augmented HPB model. We then try to tackle these factors by extending the definition of the nonterminal label to be composed of a sequence of CCG categories and augmenting the glue grammar with CCG combinatory rules. We demonstrate that our extension approaches help to significantly increase the scope of the syntactic constraints applied in our CCG-augmented HPB model and achieve significant improvements over the HPB SMT baseline

    Extraction and Coordination in Phrase Structure Grammar and Categorial Grammar

    Get PDF
    A large proportion of computationally-oriented theories of grammar operate within the confines of monostratality (i.e. there is only one level of syntactic analysis), compositionality (i.e. the meaning of an expression is determined by the meanings of its syntactic parts, plus their manner of combination), and adjacency (i.e. the only operation on terminal strings is concatenation). This thesis looks at two major approaches falling within these bounds: that based on phrase structure grammar (e.g. Gazdar), and that based on categorial grammar (e.g. Steedman). The theories are examined with reference to extraction and coordination constructions; crucially a range of 'compound' extraction and coordination phenomena are brought to bear. It is argued that the early phrase structure grammar metarules can characterise operations generating compound phenomena, but in so doing require a categorial-like category system. It is also argued that while categorial grammar contains an adequate category apparatus, Steedman's primitives such as composition do not extend to cover the full range of data. A theory is therefore presented integrating the approaches of Gazdar and Steedman. The central issue as regards processing is derivational equivalence: the grammars under consideration typically generate many semantically equivalent derivations of an expression. This problem is addressed by showing how to axiomatise derivational equivalence, and a parser is presented which employs the axiomatisation to avoid following equivalent paths

    Using pattern-action rules for the generation of GPSG structures from separate semantic representations

    Get PDF
    In many tactical NL generators the semantic input structure is taken for granted. In this paper, a new approach to multilingual, tactical generation is presented that keeps the syntax separate from the semantics. This allows for the system to be directly adapted to application-dependent representations. In the case at hand, the semantics is specifically designed for sentence-semantic transfer in a machine translation system. The syntax formalism used is Generalized Phrase Structure Grammar (GPSG). The mapping from semantic onto syntactic structures is performed by a set of pattern-action rules. Each rule matches a piece of the input structure and guides the GPSG structure-building process by telling it which syntax rule(s) to apply. The scope of each pattern-action rule is strictly local, the actions are primitive, and rules can not call each other. These restrictions render the production rule approach both highly modular and transparent

    Théorie syntaxique et théorie du parsage : quelques réflexions

    Get PDF

    Generierung natürlicher Sprache

    Get PDF
    Dieser Aufsatz beschreibt das interdisziplinäre Forschungsgebiet Generierung natürlicher Sprache und gibt einen Überblick über den gegenwärtigen Stand der Kunst. Behandelt werden Ansätze aus der Psycholinguistik, Planungs- und Entscheidungsverfahren aus der sprachverarbeitenden KI und Verfahren auf der Grundlage moderner Grammatikformalismen. Die jeweiligen Forschungsziele und -methoden werden dargestellt.This report describes the interdisciplinary research field of natural language generation and gives an overview of the current state of the art. The paper presents psycholinguistic approaches, AI planning and decision-making processes, and generators based on modern grammar formalisms. For each case, the research goals and methods are described

    Generalisierte Phasenstruktur-Grammatiken und ihre Verwendung zur maschinellen Sprachverarbeitung

    Get PDF
    Der vorliegende Artikel setzt sich mit der Syntaxtheorie der Generalisierten Phrasenstruktur-Grammatiken (GPSG) auseinander, gibt eine neue formale Definition des aktuellen Formalismus aus und zeigt die mit diesem Formalismus verbundenen Probleme auf. Darüber hinaus wird begründet, warum der Formalismus nicht effizient implementierbar ist. Es wird eine konstruktive Version von GPSG vorgeschlagen, die für die maschinelle Sprachverarbeitung (Parsing und Generierung) geeignet ist. Der Artikel kann gleichzeitig als eine Grundlage für Lehrveranstaltungen über GPSG dienen.This article describes the syntax theory of Generalized Phrase Structure Grammar (GPSG), introduces a new formal definition and reveals the problems connected with this formalism. Moreover it is shown why the formalism cannot be implemented. A constructive version of GPSG is suggested that is suitable for parsing and generation. This report may also serve as a basis for lectures about GPSG

    Generalisierte Phasenstruktur-Grammatiken und ihre Verwendung zur maschinellen Sprachverarbeitung

    Get PDF
    Der vorliegende Artikel setzt sich mit der Syntaxtheorie der Generalisierten Phrasenstruktur-Grammatiken (GPSG) auseinander, gibt eine neue formale Definition des aktuellen Formalismus aus und zeigt die mit diesem Formalismus verbundenen Probleme auf. Darüber hinaus wird begründet, warum der Formalismus nicht effizient implementierbar ist. Es wird eine konstruktive Version von GPSG vorgeschlagen, die für die maschinelle Sprachverarbeitung (Parsing und Generierung) geeignet ist. Der Artikel kann gleichzeitig als eine Grundlage für Lehrveranstaltungen über GPSG dienen.This article describes the syntax theory of Generalized Phrase Structure Grammar (GPSG), introduces a new formal definition and reveals the problems connected with this formalism. Moreover it is shown why the formalism cannot be implemented. A constructive version of GPSG is suggested that is suitable for parsing and generation. This report may also serve as a basis for lectures about GPSG

    Classification-based phrase structure grammar: an extended revised version of HPSG

    Get PDF
    This thesis is concerned with a presentation of Classification -based Phrase Structure Grammar (or cPSG), a grammatical theory that has grown out of extensive revisions of, and extensions to, HPSG. The fundamental difference between this theory and HPSG concerns the central role that classification plays in the grammar: the grammar classifies strings, according to their feature structure descriptions, as being of various types. Apart from the role of classification, the theory bears a close resemblance to HPSG, though it is by no means a direct translation, including numerous revisions and extensions. A central goal in the development of the theory has been its computational implementation, which is included in the thesis.The presentation may be divided into four parts. In the first, chapters 1 and 2, we present the grammatical formalism within which the theory is stated. This consists of a development of the notion of a classificatory system (chapter 1), and the incorporation of hierarchality into that notion (chapter 2).The second part concerns syntactic issues. Chapter 3 revises the HPSG treatment of specifiers, complements and adjuncts, incorporating ideas that specifiers and complements should be distinguished and presenting a treatment of adjuncts whereby the head is selected for by the adjunct. Chapter 4 presents several options for an account of unbounded dependencies. The accounts are based loosely on that of GPSG, and a reconstruction of GPSG's Foot Feature Principle is presented which does not involve a notion of default. Chapter 5 discusses coordination, employing an extension of Rounds- Kasper logic to allow a treatment of cross -categorial coordination.In the third part, chapters 6, 7 and 8, we turn to semantic issues. We begin (Chapter 6) with a discussion of Situation Theory, the background semantic theory, attempting to establish a precise and coherent version of the theory within which to work. Chapter 7 presents the bulk of the treatment of semantics, and can be seen as an extensive revision of the HPSG treatment of semantics. The aim is to provide a semantic treatment which is faithful to the version of Situation Theory presented in Chapter 6. Chapter 8 deals with quantification, discussing the nature of quantification in Situation Theory before presenting a treatment of quantification in CPSG. Some residual questions about the semantics of coordinated noun phrases are also addressed in this chapter.The final part, Chapter 9, concerns the actual computational implementation of the theory. A parsing algorithm based on hierarchical classification is presented, along with four strategies that might be adopted given that algorithm. Also discussed are some implementation details. A concluding chapter summarises the arguments of the thesis and outlines some avenues for future research
    corecore