346 research outputs found
A non-projective greedy dependency parser with bidirectional LSTMs
The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of
the Covington (2001) algorithm for non-projective dependency parsing. The
bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to
train a greedy parser with a dynamic oracle to mitigate error propagation. The
model participated in the CoNLL 2017 UD Shared Task. In spite of not using any
ensemble methods and using the baseline segmentation and PoS tagging, the
parser obtained good results on both macro-average LAS and UAS in the big
treebanks category (55 languages), ranking 7th out of 33 teams. In the all
treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the
all and big categories is mainly due to the poor performance on four parallel
PUD treebanks, suggesting that some `suffixed' treebanks (e.g. Spanish-AnCora)
perform poorly on cross-treebank settings, which does not occur with the
corresponding `unsuffixed' treebank (e.g. Spanish). By changing that, we obtain
the 11th best LAS among all runs (official and unofficial). The code is made
available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSEComment: 12 pages, 2 figures, 5 table
Crossings as a side effect of dependency lengths
The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in
Complexity (Wiley
The Swedish-Turkish Parallel Corpus and Tools for its Creation
Proceedings of the 16th Nordic Conference
of Computational Linguistics NODALIDA-2007.
Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit.
University of Tartu, Tartu, 2007.
ISBN 978-9985-4-0513-0 (online)
ISBN 978-9985-4-0514-7 (CD-ROM)
pp. 136-143
An improved neural network model for joint POS tagging and dependency parsing
We propose a novel neural network model for joint part-of-speech (POS)
tagging and dependency parsing. Our model extends the well-known BIST
graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating
a BiLSTM-based tagging component to produce automatically predicted POS tags
for the parser. On the benchmark English Penn treebank, our model obtains
strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+%
absolute improvements to the BIST graph-based parser, and also obtaining a
state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental
results on parsing 61 "big" Universal Dependencies treebanks from raw texts
show that our model outperforms the baseline UDPipe (Straka and Strakov\'a,
2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS
score. In addition, with our model, we also obtain state-of-the-art downstream
task scores for biomedical event extraction and opinion analysis applications.
Our code is available together with all pre-trained models at:
https://github.com/datquocnguyen/jPTDPComment: 11 pages; In Proceedings of the CoNLL 2018 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependencies, to appea
- …