1,477 research outputs found

    Better, Faster, Stronger Sequence Tagging Constituent Parsers

    Get PDF
    Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.Comment: NAACL 2019 (long papers). Contains corrigendu

    The CoNLL 2007 shared task on dependency parsing

    Get PDF
    The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results

    Improved Chinese Language Processing for an Open Source Search Engine

    Get PDF
    Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation. Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote and improved the following sub-systems of Yioop to try to make them as state-of-the-art as possible: Chinese text segmentation, Part-of-speech (POS) tagging, Named Entity Recognition (NER), and Question and Answering System. Compared to the previous system we had a 9% improvement on Chinese words Segmentation accuracy. We built POS tagging with 89% accuracy. And We implement NER System with 76% accuracy
    corecore