278,691 research outputs found
Constituent Parsing as Sequence Labeling
We introduce a method to reduce constituent parsing to sequence labeling. For
each word w_t, it generates a label that encodes: (1) the number of ancestors
in the tree that the words w_t and w_{t+1} have in common, and (2) the
nonterminal symbol at the lowest common ancestor. We first prove that the
proposed encoding function is injective for any tree without unary branches. In
practice, the approach is made extensible to all constituency trees by
collapsing unary branches. We then use the PTB and CTB treebanks as testbeds
and propose a set of fast baselines. We achieve 90.7% F-score on the PTB test
set, outperforming the Vinyals et al. (2015) sequence-to-sequence parser. In
addition, sacrificing some accuracy, our approach achieves the fastest
constituent parsing speeds reported to date on PTB by a wide margin.Comment: EMNLP 2018 (Long Papers). Revised version with improved results after
fixing evaluation bu
Viable Dependency Parsing as Sequence Labeling
We recast dependency parsing as a sequence labeling problem, exploring
several encodings of dependency trees as labels. While dependency parsing by
means of sequence labeling had been attempted in existing work, results
suggested that the technique was impractical. We show instead that with a
conventional BiLSTM-based model it is possible to obtain fast and accurate
parsers. These parsers are conceptually simple, not needing traditional parsing
algorithms or auxiliary structures. However, experiments on the PTB and a
sample of UD treebanks show that they provide a good speed-accuracy tradeoff,
with results competitive with more complex approaches.Comment: Camera-ready version to appear at NAACL 2019 (final peer-reviewed
manuscript). 8 pages (incl. appendix
- …