Search CORE

4,550 research outputs found

Constituent Parsing as Sequence Labeling

Author: Gómez-Rodríguez Carlos
Vilares David
Publication venue
Publication date: 01/01/2018
Field of study

We introduce a method to reduce constituent parsing to sequence labeling. For each word w_t, it generates a label that encodes: (1) the number of ancestors in the tree that the words w_t and w_{t+1} have in common, and (2) the nonterminal symbol at the lowest common ancestor. We first prove that the proposed encoding function is injective for any tree without unary branches. In practice, the approach is made extensible to all constituency trees by collapsing unary branches. We then use the PTB and CTB treebanks as testbeds and propose a set of fast baselines. We achieve 90.7% F-score on the PTB test set, outperforming the Vinyals et al. (2015) sequence-to-sequence parser. In addition, sacrificing some accuracy, our approach achieves the fastest constituent parsing speeds reported to date on PTB by a wide margin.Comment: EMNLP 2018 (Long Papers). Revised version with improved results after fixing evaluation bu

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Crossref

Evolving a DSL implementation

Author: E. Kohlbecker
K. Czarnecki
M. Bravenboer
M.G.J. Brand van den
S. Seefried
T. Sheard
T. Sheard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Domain Specific Languages (DSLs) are small languages designed for use in a specific domain. DSLs typically evolve quite radically throughout their lifetime, but current DSL implementation approaches are often clumsy in the face of such evolution. In this paper I present a case study of an DSL evolving in its syntax, semantics, and robustness, implemented in the Converge language. This shows how real-world DSL implementations can evolve along with changing requirements

Crossref

Middlesex University Research Repository

Bournemouth University Research Online

King's Research Portal

Generalizing input-driven languages: theoretical and practical benefits

Author: Mandrioli Dino
Pradella Matteo
Publication venue
Publication date: 02/05/2017
Field of study

Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

DCU-Paris13 systems for the SANCL 2012 shared task

Author: Anton Bryl
Jennifer Foster
Joachim Wagner
Joseph Le Roux
Rasul Samad
Zadeh Kaljahi
Publication venue
Publication date: 07/06/2012
Field of study

The DCU-Paris13 team submitted three systems to the SANCL 2012 shared task on parsing English web text. The first submission, the highest ranked constituency parsing system, uses a combination of PCFG-LA product grammar parsing and self-training. In the second submission, also a constituency parsing system, the n-best lists of various parsing models are combined using an approximate sentence-level product model. The third system, the highest ranked system in the dependency parsing track, uses voting over dependency arcs to combine the output of three constituency parsing systems which have been converted to dependency trees. All systems make use of a data-normalisation component, a parser accuracy predictor and a genre classifier

CiteSeerX

Irish Universities

HAL Descartes

DCU Online Research Access Service

HAL-Paris 13

Hal-Diderot