54 research outputs found
Using supertags as source language context in SMT
Recent research has shown that Phrase-Based Statistical Machine Translation (PB-SMT) systems can benefit from two
enhancements: (i) using words and POS tags as context-informed features on the source side; and (ii) incorporating lexical syntactic descriptions in the form of supertags on the target side. In this work we
present a novel PB-SMT model that combines these two aspects by using supertags as source language contextinformed features. These features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. In our experiments two
kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar.
We use a memory-based classification framework that enables the estimation of these features while avoiding
problems of sparseness. Despite the differences between these two approaches, the supertaggers give similar improvements. We evaluate the performance of our approach on an English-to-Chinese translation task using a state-of-the-art phrase-based SMT system, and report an
improvement of 7.88% BLEU score in translation quality when adding supertags as context-informed features
CCG contextual labels in hierarchical phrase-based SMT
In this paper, we present a method to employ target-side syntactic contextual information in a Hierarchical Phrase-Based system. Our method uses Combinatory Categorial Grammar (CCG) to annotate training data with labels that represent the left and right syntactic context of target-side phrases. These labels are then used to assign labels to nonterminals in hierarchical rules. CCG-based contextual labels help
to produce more grammatical translations by forcing phrases which replace nonterminals during translations to comply with the contextual constraints imposed by the labels. We present experiments which examine the performance of CCG contextual labels on ChineseâEnglish and ArabicâEnglish translation in the news and speech expressions domains using different data sizes and CCG-labeling settings. Our experiments show that our CCG contextual labels-based system achieved a 2.42% relative BLEU improvement over a PhraseBased baseline on ArabicâEnglish translation and a 1% relative BLEU improvement over a Hierarchical Phrase-Based system baseline on ChineseâEnglish translation
Integrating source-language context into log-linear models of statistical machine translation
The translation features typically used in state-of-the-art statistical machine translation (SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear phrase-based SMT (PB-SMT) and hierarchical PB-SMT (HPB-SMT), and can positively
influence the weighting and selection of target phrases, and thus improve translation quality. In this thesis we present novel approaches to incorporate source-language contextual modelling into the state-of-the-art SMT models in order to enhance the quality of lexical selection. We investigate the effectiveness of use of a range of contextual features, including lexical features of neighbouring words, part-of-speech tags, supertags, sentence-similarity features, dependency information, and semantic roles. We explored a series of language pairs featuring typologically different languages, and examined the scalability of our research to larger amounts of training data.
While our results are mixed across feature selections, language pairs, and learning curves, we observe that including contextual features of the source sentence
in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in
combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, supertag features in English-to-Chinese translation, or combination of supertag and lexical features in English-to-Dutch subtitle
translation. Furthermore, we investigate the applicability of our lexical contextual model in another closely related NLP problem, namely machine transliteration
Syntactic and semantic features for statistical and neural machine translation
Machine Translation (MT) for language pairs with long distance dependencies and
word reordering, such as GermanâEnglish, is prone to producing output that is lexically
or syntactically incoherent. Statistical MT (SMT) models used explicit or latent
syntax to improve reordering, however failed at capturing other long distance dependencies.
This thesis explores how explicit sentence-level syntactic information can improve
translation for such complex linguistic phenomena. In particular, we work at the
level of the syntactic-semantic interface with representations conveying the predicate-argument
structures. These are essential to preserving semantics in translation and
SMT systems have long struggled to model them.
String-to-tree SMT systems use explicit target syntax to handle long-distance reordering,
but make strong independence assumptions which lead to inconsistent lexical
choices. To address this, we propose a Selectional Preferences feature which models
the semantic affinities between target predicates and their argument fillers using the
target dependency relations available in the decoder. We found that our feature is not
effective in a string-to-tree system for GermanâEnglish and that often the conditioning
context is wrong because of mistranslated verbs.
To improve verb translation, we proposed a Neural Verb Lexicon Model (NVLM)
incorporating sentence-level syntactic context from the source which carries relevant
semantic information for verb disambiguation. When used as an extra feature for re-ranking
the output of a Germanâ English string-to-tree system, the NVLM improved
verb translation precision by up to 2.7% and recall by up to 7.4%.
While the NVLM improved some aspects of translation, other syntactic and lexical
inconsistencies are not being addressed by a linear combination of independent models.
In contrast to SMT, neural machine translation (NMT) avoids strong independence
assumptions thus generating more fluent translations and capturing some long-distance
dependencies. Still, incorporating additional linguistic information can improve translation
quality.
We proposed a method for tightly coupling target words and syntax in the NMT
decoder. To represent syntax explicitly, we used CCG supertags, which encode subcategorization
information, capturing long distance dependencies and attachments. Our
method improved translation quality on several difficult linguistic constructs, including
prepositional phrases which are the most frequent type of predicate arguments. These
improvements over a strong baseline NMT system were consistent across two language
pairs: 0.9 BLEU for GermanâEnglish and 1.2 BLEU for RomanianâEnglish
Structural generalization in COGS: Supertagging is (almost) all you need
In many Natural Language Processing applications, neural networks have been
found to fail to generalize on out-of-distribution examples. In particular,
several recent semantic parsing datasets have put forward important limitations
of neural networks in cases where compositional generalization is required. In
this work, we extend a neural graph-based semantic parsing framework in several
ways to alleviate this issue. Notably, we propose: (1) the introduction of a
supertagging step with valency constraints, expressed as an integer linear
program; (2) a reduction of the graph prediction problem to the maximum
matching problem; (3) the design of an incremental early-stopping training
strategy to prevent overfitting. Experimentally, our approach significantly
improves results on examples that require structural generalization in the COGS
dataset, a known challenging benchmark for compositional generalization.
Overall, our results confirm that structural constraints are important for
generalization in semantic parsing.Comment: accepted at EMNLP 202
Complexity of Lexical Descriptions and its Relevance to Partial Parsing
In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1â° on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83â° of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application
Recommended from our members
Inducing grammars from linguistic universals and realistic amounts of supervision
The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input â even that which can be collected in just a few hours â can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.Computer Science
Integrated supertagging and parsing
EuroMatrixPlus project funded by the European Commission, 7th Framework ProgrammeParsing is the task of assigning syntactic or semantic structure to a natural language
sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar
(CCG; Steedman 2000). CCG allows incremental processing, which is essential
for speech recognition and some machine translation models, and it can build semantic
structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing
task by assigning lexical types to words in a sentence using a sequence model. It has
emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran,
2007) by reducing the parserâs search space. This has been very successful and it is the
central theme of this thesis.
We begin by an analysis of how efficiency is being traded for accuracy in supertagging.
Pruning the search space by supertagging is inherently approximate and to contrast
this we include A* in our analysis, a classic exact search technique. Interestingly,
we find that combining the two methods improves efficiency but we also demonstrate
that excessive pruning by a supertagger significantly lowers the upper bound on accuracy
of a CCG parser.
Inspired by this analysis, we design a single integrated model with both supertagging
and parsing features, rather than separating them into distinct models chained
together in a pipeline. To overcome the resulting complexity, we experiment with both
loopy belief propagation and dual decomposition approaches to inference, the first empirical
comparison of these algorithms that we are aware of on a structured natural
language processing problem.
Finally, we address training the integrated model. We adopt the idea of optimising
directly for a task-specific metric such as is common in other areas like statistical
machine translation. We demonstrate how a novel dynamic programming algorithm
enables us to optimise for F-measure, our task-specific evaluation metric, and experiment
with approximations, which prove to be excellent substitutions.
Each of the presented methods improves over the state-of-the-art in CCG parsing.
Moreover, the improvements are additive, achieving a labelled/unlabelled dependency
F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and
87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task
to date. Our techniques are general and we expect them to apply to other parsing problems,
including lexicalised tree adjoining grammar and context-free grammar parsing
- âŠ