10 research outputs found

    Developments in the TIGER Annotation Scheme and their Realization in the Corpus

    No full text
    This paper presents the annotation of the German TIGER Treebank. First, issues concerning the annotation, representation as well as querying of the treebank are discussed. Within this context, the annotation tool ANNOTATE, the export and XML formats of the TIGER Treebank and the TIGER search tool are briefly introduced. Secondly, the developments of the TIGER annotation scheme and their realization in the corpus are introduced focussing on the differences between the underlying NEGRA annotation scheme and the further developed TIGER annotation scheme. The main differences are concerned with verb-subcategorization, coordination, appositions and parentheses as well as proper nouns. Thirdly, the annotation scheme is assessed through an evaluation and a problem discussion of the above mentioned changes. For this purpose, inter-annotator agreement in the TIGER project has been analyzed focussing on exactly these changes. This analysis shows where the annotators' decision problems are. These difficulties are discussed in greater detail on the basis of annotation examples. The paper concludes with some suggestions for the improvement of the TIGER annotation scheme

    The TIGER Treebank

    No full text
    This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool Annotate, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGERin, a graphical user interface for query input. The paper concludes with a summary and some directions for future work

    Effects of issue and poll news on electoral volatility: conversion or crystallization?

    No full text
    In the last decades, electoral volatility has been on the rise in Western democracies. Scholars have proposed several explanations for this phenomenon of floating voters. Exposure to media coverage as a short-term explanation for electoral volatility has of yet been understudied. This study examines the effect of media content (issue news and poll news) on two different types of vote change: conversion, switching from one party to another, and crystallization, switching from being undecided to casting a vote for a party. We use a national panel survey (N = 765) and link this to a content analysis of campaign news on television and in newspapers during national Dutch elections. Findings reveal that exposure to issue news increases the chance of crystallization, whereas it decreases the chance of conversion. Conversely, exposure to poll news increases the chance of conversion, whereas it decreases the chance of crystallization

    Tabula nearly rasa: probing the linguistic knowledge of character-level neural language models trained on unsegmented text

    No full text
    Recurrent neural networks (RNNs) have reached striking performance in many natural language processing tasks. This has renewed interest in whether these generic sequence processing devices are inducing genuine linguistic knowledge. Nearly all current analytical studies, however, initialize the RNNs with a vocabulary of known words, and feed them tokenized input during training. We present a multi-lingual study of the linguistic knowledge encoded in RNNs trained as character-level language models, on input data with word boundaries removed. These networks face a tougher and more cognitively realistic task, having to discover any useful linguistic unit from scratch based on input statistics. The results show that our “near tabula rasa” RNNs are mostly able to solve morphological, syntactic and semantic tasks that intuitively presuppose word-level knowledge, and indeed they learned, to some extent, to track word boundaries. Our study opens the door to speculations about the necessity of an explicit, rigid word lexicon in language learning and usage
    corecore