250 research outputs found

    A non-projective greedy dependency parser with bidirectional LSTMs

    Get PDF
    The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods and using the baseline segmentation and PoS tagging, the parser obtained good results on both macro-average LAS and UAS in the big treebanks category (55 languages), ranking 7th out of 33 teams. In the all treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the all and big categories is mainly due to the poor performance on four parallel PUD treebanks, suggesting that some `suffixed' treebanks (e.g. Spanish-AnCora) perform poorly on cross-treebank settings, which does not occur with the corresponding `unsuffixed' treebank (e.g. Spanish). By changing that, we obtain the 11th best LAS among all runs (official and unofficial). The code is made available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSEComment: 12 pages, 2 figures, 5 table

    Crossings as a side effect of dependency lengths

    Get PDF
    The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

    Are crossing dependencies really scarce?

    Get PDF
    The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.Peer ReviewedPostprint (author's final draft

    An improved neural network model for joint POS tagging and dependency parsing

    Full text link
    We propose a novel neural network model for joint part-of-speech (POS) tagging and dependency parsing. Our model extends the well-known BIST graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating a BiLSTM-based tagging component to produce automatically predicted POS tags for the parser. On the benchmark English Penn treebank, our model obtains strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+% absolute improvements to the BIST graph-based parser, and also obtaining a state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental results on parsing 61 "big" Universal Dependencies treebanks from raw texts show that our model outperforms the baseline UDPipe (Straka and Strakov\'a, 2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS score. In addition, with our model, we also obtain state-of-the-art downstream task scores for biomedical event extraction and opinion analysis applications. Our code is available together with all pre-trained models at: https://github.com/datquocnguyen/jPTDPComment: 11 pages; In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, to appea
    corecore