159 research outputs found
Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing
International audienceIn this paper, we investigate various strategies to predict both syntactic dependency parsing and contiguous multiword expression (MWE) recognition, testing them on the dependency version of French Treebank \cite{abeille:04}, as instantiated in the SPMRL Shared Task \cite{spmrl:st:2013}. Our work focuses on using an alternative representation of syntactically regular MWEs, which captures their syntactic internal structure. We obtain a system with comparable performance to that of previous works on this dataset, but which predicts both syntactic dependencies and the internal structure of MWEs. This can be useful for capturing the various degrees of semantic compositionality of MWEs
CCG Parsing and Multiword Expressions
This thesis presents a study about the integration of information about
Multiword Expressions (MWEs) into parsing with Combinatory Categorial Grammar
(CCG). We build on previous work which has shown the benefit of adding
information about MWEs to syntactic parsing by implementing a similar pipeline
with CCG parsing. More specifically, we collapse MWEs to one token in training
and test data in CCGbank, a corpus which contains sentences annotated with CCG
derivations. Our collapsing algorithm however can only deal with MWEs when they
form a constituent in the data which is one of the limitations of our approach.
We study the effect of collapsing training and test data. A parsing effect
can be obtained if collapsed data help the parser in its decisions and a
training effect can be obtained if training on the collapsed data improves
results. We also collapse the gold standard and show that our model
significantly outperforms the baseline model on our gold standard, which
indicates that there is a training effect. We show that the baseline model
performs significantly better on our gold standard when the data are collapsed
before parsing than when the data are collapsed after parsing which indicates
that there is a parsing effect. We show that these results can lead to improved
performance on the non-collapsed standard benchmark although we fail to show
that it does so significantly. We conclude that despite the limited settings,
there are noticeable improvements from using MWEs in parsing. We discuss ways
in which the incorporation of MWEs into parsing can be improved and hypothesize
that this will lead to more substantial results.
We finally show that turning the MWE recognition part of the pipeline into an
experimental part is a useful thing to do as we obtain different results with
different recognizers.Comment: MSc thesis, The University of Edinburgh, 2014, School of Informatics,
MSc Artificial Intelligenc
Multiword expression processing: A survey
Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by "MWE processing," distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives
Discriminative lexical semantic segmentation with gaps: running the MWE gamut
We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical seman-tic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling effi-cient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE iden-tification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation pro-cedure, achieving nearly 60 % F1 for MWE identification.
Current trends
Deep parsing is the fundamental process aiming at the representation of the syntactic
structure of phrases and sentences. In the traditional methodology this process is
based on lexicons and grammars representing roughly properties of words and interactions
of words and structures in sentences. Several linguistic frameworks, such as Headdriven
Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining
Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different
structures and combining operations for building grammar rules. These already contain
mechanisms for expressing properties of Multiword Expressions (MWE), which, however,
need improvement in how they account for idiosyncrasies of MWEs on the one
hand and their similarities to regular structures on the other hand. This collaborative
book constitutes a survey on various attempts at representing and parsing MWEs in the
context of linguistic theories and applications
Representation and parsing of multiword expressions
This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches
Promoting multiword expressions in A* TAG parsing
International audienceMultiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
- …