456 research outputs found
Parsing as Reduction
We reduce phrase-representation parsing to dependency parsing. Our reduction
is grounded on a new intermediate representation, "head-ordered dependency
trees", shown to be isomorphic to constituent trees. By encoding order
information in the dependency labels, we show that any off-the-shelf, trainable
dependency parser can be used to produce constituents. When this parser is
non-projective, we can perform discontinuous parsing in a very natural manner.
Despite the simplicity of our approach, experiments show that the resulting
parsers are on par with strong baselines, such as the Berkeley parser for
English and the best single system in the SPMRL-2014 shared task. Results are
particularly striking for discontinuous parsing of German, where we surpass the
current state of the art by a wide margin
Neural Combinatory Constituency Parsing
東京都立大学Tokyo Metropolitan University博士(情報科学)doctoral thesi
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Discontinuous Combinatory Constituency Parsing
AbstractWe extend a pair of continuous combinator-based constituency parsers (one binary and one multi-branching) into a discontinuous pair. Our parsers iteratively compose constituent vectors from word embeddings without any grammar constraints. Their empirical complexities are subquadratic. Our extension includes 1) a swap action for the orientation-based binary model and 2) biaffine attention for the chunker-based multi-branching model. In tests conducted with the Discontinuous Penn Treebank and TIGER Treebank, we achieved state-of-the-art discontinuous accuracy with a significant speed advantage
On the Use of Parsing for Named Entity Recognition
[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)
Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle
We introduce a novel transition system for discontinuous constituency
parsing. Instead of storing subtrees in a stack --i.e. a data structure with
linear-time sequential access-- the proposed system uses a set of parsing
items, with constant-time random access. This change makes it possible to
construct any discontinuous constituency tree in exactly transitions
for a sentence of length . At each parsing step, the parser considers every
item in the set to be combined with a focus item and to construct a new
constituent in a bottom-up fashion. The parsing strategy is based on the
assumption that most syntactic structures can be parsed incrementally and that
the set --the memory of the parser-- remains reasonably small on average.
Moreover, we introduce a provably correct dynamic oracle for the new transition
system, and present the first experiments in discontinuous constituency parsing
using a dynamic oracle. Our parser obtains state-of-the-art results on three
English and German discontinuous treebanks.Comment: Accepted for publication at NAACL 2019; 14 page
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
- …