4,550 research outputs found
Constituent Parsing as Sequence Labeling
We introduce a method to reduce constituent parsing to sequence labeling. For
each word w_t, it generates a label that encodes: (1) the number of ancestors
in the tree that the words w_t and w_{t+1} have in common, and (2) the
nonterminal symbol at the lowest common ancestor. We first prove that the
proposed encoding function is injective for any tree without unary branches. In
practice, the approach is made extensible to all constituency trees by
collapsing unary branches. We then use the PTB and CTB treebanks as testbeds
and propose a set of fast baselines. We achieve 90.7% F-score on the PTB test
set, outperforming the Vinyals et al. (2015) sequence-to-sequence parser. In
addition, sacrificing some accuracy, our approach achieves the fastest
constituent parsing speeds reported to date on PTB by a wide margin.Comment: EMNLP 2018 (Long Papers). Revised version with improved results after
fixing evaluation bu
Evolving a DSL implementation
Domain Specific Languages (DSLs) are small languages designed for use in a specific domain. DSLs typically evolve quite radically throughout their lifetime, but current DSL implementation approaches are often clumsy in the face of such evolution. In this paper I present a case study of an DSL evolving in its syntax, semantics, and robustness, implemented in the Converge language. This shows how real-world DSL implementations can evolve along with changing requirements
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
DCU-Paris13 systems for the SANCL 2012 shared task
The DCU-Paris13 team submitted three systems to the SANCL 2012 shared task on parsing English web text. The first submission, the highest ranked constituency parsing system, uses a combination of PCFG-LA product grammar parsing and self-training. In the second submission, also a constituency parsing system, the n-best lists of various parsing models are combined using an approximate sentence-level product model. The third system, the highest ranked system in the dependency parsing track, uses voting over dependency arcs to combine the output of three constituency parsing systems which have been converted to dependency trees. All systems make use of a data-normalisation component, a parser accuracy predictor and a genre classifier
- ā¦