149 research outputs found
Heterogeneity and standardization in data, use, and annotation : a diachronic corpus of German
This paper describes the standardization problems that come up in a diachronic corpus: it has to cope with differing standards with regard to diplomaticity, annotation, and header information. Such highly heterogeneous texts must be standardized to allow for comparative research without (too much) loss of information
Syntactic annotation of non-canonical linguistic structures
This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences
In Search of Oblivion? How the 'Right to be Forgotten' Could Undermine Web-based Corpora
AbstractCorpus linguists are now facing a new challenge to collecting accurate data for web-based corpora: the ‘Right to be Forgotten’. This element of data protection legislation allows individuals to request that links to webpages be removed if the information contained there can now be considered inaccurate, irrelevant or excessive. The potential difficulties this poses for researchers are illustrated by my experience collecting data for a corpus of neologisms appearing in online versions of UK national newspapers
Measuring morphological productivity
Not Reviewe
What's hard? : Quantitative evidence for difficult constructions in German learner data
Our study is concerned with the identification of ‘difficult’ structure s in the acquisition of a foreign language, which will shed light on theoretical considerations of L2 processing. We argue that – compared to simple vocabulary items or abstract syntactic patterns – structures that contain lexical material as well as categorial variables are especially difficult to acquire. The difficulty level for particular patterns is shown to depend on surface invariability but not on the syntactic categories within which target patterns are embedded. As an example we study the distribution of certain structures which are underused by L2 German learners
Syntactic Misuse, Overuse and Underuse: A Study of a Parsed Learner Corpus and its Target Hypothesis
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 1-3.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
- …