149 research outputs found

    Heterogeneity and standardization in data, use, and annotation : a diachronic corpus of German

    Get PDF
    This paper describes the standardization problems that come up in a diachronic corpus: it has to cope with differing standards with regard to diplomaticity, annotation, and header information. Such highly heterogeneous texts must be standardized to allow for comparative research without (too much) loss of information

    Syntactic annotation of non-canonical linguistic structures

    Get PDF
    This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences

    In Search of Oblivion? How the 'Right to be Forgotten' Could Undermine Web-based Corpora

    Get PDF
    AbstractCorpus linguists are now facing a new challenge to collecting accurate data for web-based corpora: the ‘Right to be Forgotten’. This element of data protection legislation allows individuals to request that links to webpages be removed if the information contained there can now be considered inaccurate, irrelevant or excessive. The potential difficulties this poses for researchers are illustrated by my experience collecting data for a corpus of neologisms appearing in online versions of UK national newspapers

    Measuring morphological productivity

    Get PDF
    Not Reviewe

    What's hard? : Quantitative evidence for difficult constructions in German learner data

    Get PDF
    Our study is concerned with the identification of ‘difficult’ structure s in the acquisition of a foreign language, which will shed light on theoretical considerations of L2 processing. We argue that – compared to simple vocabulary items or abstract syntactic patterns – structures that contain lexical material as well as categorial variables are especially difficult to acquire. The difficulty level for particular patterns is shown to depend on surface invariability but not on the syntactic categories within which target patterns are embedded. As an example we study the distribution of certain structures which are underused by L2 German learners

    graphANNIS: A Fast Query Engine for Deeply Annotated Linguistic Corpora

    Get PDF

    Syntactic Misuse, Overuse and Underuse: A Study of a Parsed Learner Corpus and its Target Hypothesis

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 1-3. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891
    corecore