Search CORE

3,461 research outputs found

Nightmare at test time: How punctuation prevents parsers from generalizing

Author: Augenstein Isabelle
de Lhoneux Miryam
Søgaard Anders
Publication venue
Publication date: 01/01/2018
Field of study

Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.Comment: Analyzing and interpreting neural networks for NLP, EMNLP 2018 worksho

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

One model, two languages: training bilingual parsers with harmonized treebanks

Author: Alonso Miguel A.
Gómez-Rodríguez Carlos
Vilares David
Publication venue
Publication date: 01/01/2016
Field of study

We introduce an approach to train lexicalized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOptimizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy, but some even achieve significant improvements over the corresponding monolingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.Comment: 7 pages, 4 tables, 1 figur

arXiv.org e-Print Archive

Crossref

DepAnn - An Annotation Tool for Dependency Treebanks

Author: Kakkonen Tuomo
Publication venue
Publication date: 01/01/2005
Field of study

DepAnn is an interactive annotation tool for dependency treebanks, providing both graphical and text-based annotation interfaces. The tool is aimed for semi-automatic creation of treebanks. It aids the manual inspection and correction of automatically created parses, making the annotation process faster and less error-prone. A novel feature of the tool is that it enables the user to view outputs from several parsers as the basis for creating the final tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general encoding format for both, representing the parser outputs and saving the annotated treebank. The tool includes an automatic consistency checker for sentence structures. In addition, the tool enables users to build structures manually, add comments on the annotations, modify the tagsets, and mark sentences for further revision

arXiv.org e-Print Archive

CiteSeerX

A Transition-Based Directed Acyclic Graph Parser for UCCA

Author: Abend Omri
Hershcovich Daniel
Rappoport Ari
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We present the first parser for UCCA, a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. To our knowledge, the conjunction of these formal properties is not supported by any existing parser. Our transition-based parser, which uses a novel transition set and features based on bidirectional LSTMs, has value not just for UCCA parsing: its ability to handle more general graph structures can inform the development of parsers for other semantic DAG structures, and in languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201

arXiv.org e-Print Archive

Crossref

Experiences with the GTU grammar development environment

Author: Richarz Dirk
Volk Martin
Publication venue
Publication date: 12/07/1997
Field of study

In this paper we describe our experiences with a tool for the development and testing of natural language grammars called GTU (German: Grammatik-Testumgebumg; grammar test environment). GTU supports four grammar formalisms under a window-oriented user interface. Additionally, it contains a set of German test sentences covering various syntactic phenomena as well as three types of German lexicons that can be attached to a grammar via an integrated lexicon interface. What follows is a description of the experiences we gained when we used GTU as a tutoring tool for students and as an experimental tool for CL researchers. From these we will derive the features necessary for a future grammar workbench.Comment: 7 pages, uses aclap.st

arXiv.org e-Print Archive

ZORA