Search CORE

890 research outputs found

How to compare treebanks

Author: Kübler Sandra
Maier Wolfgang
Rehbein Ines
Versley Yannick
Publication venue
Publication date: 01/01/2008
Field of study

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination

Hochschulschriftenserver - Universität Frankfurt am Main

A testsuite for testing parser performance on complex German grammatical constructions [TePaCoC - a corpus for testing parser performance on complex German grammatical constructions]

Author: Genabith Josef van
Kübler Sandra
Rehbein Ines
Publication venue
Publication date: 01/11/2008
Field of study

Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors

Utrecht University Repository

Hochschulschriftenserver - Universität Frankfurt am Main

Learning to Understand Child-directed and Adult-directed Speech

Author: Alishahi Afra
Chrupała Grzegorz
Gelderloos Lieke
Publication venue
Publication date: 01/01/2020
Field of study

Speech directed to children differs from adult-directed speech in linguistic aspects such as repetition, word choice, and sentence length, as well as in aspects of the speech signal itself, such as prosodic and phonemic variation. Human language acquisition research indicates that child-directed speech helps language learners. This study explores the effect of child-directed speech when learning to extract semantic information from speech directly. We compare the task performance of models trained on adult-directed speech (ADS) and child-directed speech (CDS). We find indications that CDS helps in the initial stages of learning, but eventually, models trained on ADS reach comparable task performance, and generalize better. The results suggest that this is at least partially due to linguistic rather than acoustic properties of the two registers, as we see the same pattern when looking at models trained on acoustically comparable synthetic speech.Comment: Authors found an error in preprocessing of transcriptions before they were fed to SBERT. After correction, the experiments were rerun. The updated results can be found in this version. Importantly, - Most scores were affected to a small degree (performance was slightly worse). - The effect was consistent across conditions. Therefore, the general patterns remain the sam

arXiv.org e-Print Archive

Crossref

Tilburg University Repository

Content Differences in Syntactic and Semantic Representations

Author: Abend Omri
Hershcovich Daniel
Rappoport Ari
Publication venue
Publication date: 01/01/2019
Field of study

Syntactic analysis plays an important role in semantic parsing, but the nature of this role remains a topic of ongoing debate. The debate has been constrained by the scarcity of empirical comparative studies between syntactic and semantic schemes, which hinders the development of parsing methods informed by the details of target schemes and constructions. We target this gap, and take Universal Dependencies (UD) and UCCA as a test case. After abstracting away from differences of convention or formalism, we find that most content divergences can be ascribed to: (1) UCCA's distinction between a Scene and a non-Scene; (2) UCCA's distinction between primary relations, secondary ones and participants; (3) different treatment of multi-word expressions, and (4) different treatment of inter-clause linkage. We further discuss the long tail of cases where the two schemes take markedly different approaches. Finally, we show that the proposed comparison methodology can be used for fine-grained evaluation of UCCA parsing, highlighting both challenges and potential sources for improvement. The substantial differences between the schemes suggest that semantic parsers are likely to benefit downstream text understanding applications beyond their syntactic counterparts.Comment: NAACL-HLT 2019 camera read

arXiv.org e-Print Archive

Crossref

The influence of conceptual user models on the creation and interpretation of diagrams representing reactive systems

Author: Tabachneck-Schijf H.J.M.
Verpoorten J.H.
Weg R.L.W. van de
Wieringa R.J.
Publication venue: Open Institute of Knowledge
Publication date: 01/01/2006
Field of study

In system design, many diagrams of many different types are used. Diagrams communicate design aspects between members of the development team, and between these experts and the non-expert customers and future users. Mastering the creation of diagrams is often a challenging task, judging by particular errors persistently found in diagrams created by undergraduate computer science students. We assume a possible misalignment between human perception and cognition on the one hand and the diagrams’ structure and syntax on the other. This article presents the results of an investigation of such a misalignment. We focus on the deployment of so-called 'conceptual user models' (mental models, created by users in their mind) at the creation of diagrams. We propose a taxonomy for mental mappings, used for categorization of representations. We describe an experiment where naive and novice subjects created one or several diagrams of a familiar task. We use our taxonomy for analysing these diagrams, both for the represented task structure and the symbols used. The results indeed show a mismatch between mental models and currently used diagram techniques

University of Twente Research Information

Utrecht University Repository

Marrying Universal Dependencies and Universal Morphology

Author: Cotterell Ryan
Hulden Mans
McCarthy Arya D.
Silfverberg Miikka
Yarowsky David
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1

arXiv.org e-Print Archive

Crossref

Designing Focused and Efficient Annotation Tools

Author: Hofs D.H.W.
Jovanovic N.
Reidsma Dennis
Publication venue: Noldus Information Technology
Publication date: 01/01/2005
Field of study

University of Twente Research Information