57 research outputs found

    Heterogeneity and standardization in data, use, and annotation : a diachronic corpus of German

    Get PDF
    This paper describes the standardization problems that come up in a diachronic corpus: it has to cope with differing standards with regard to diplomaticity, annotation, and header information. Such highly heterogeneous texts must be standardized to allow for comparative research without (too much) loss of information

    Syntactic annotation of non-canonical linguistic structures

    Get PDF
    This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences

    Measuring morphological productivity

    Get PDF
    Not Reviewe

    What's hard? : Quantitative evidence for difficult constructions in German learner data

    Get PDF
    Our study is concerned with the identification of ‘difficult’ structure s in the acquisition of a foreign language, which will shed light on theoretical considerations of L2 processing. We argue that – compared to simple vocabulary items or abstract syntactic patterns – structures that contain lexical material as well as categorial variables are especially difficult to acquire. The difficulty level for particular patterns is shown to depend on surface invariability but not on the syntactic categories within which target patterns are embedded. As an example we study the distribution of certain structures which are underused by L2 German learners

    Syntactic Misuse, Overuse and Underuse: A Study of a Parsed Learner Corpus and its Target Hypothesis

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili MĂŒĂŒrisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 1-3. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Anregungen, Beispiele, Perspektiven

    Get PDF
    Dieser Beitrag ist mit Zustimmung des Rechteinhabers (De Gruyter) frei zugÀnglich.This paper argues for incorporating corpus data into the teaching of historical linguistics. While deeply annotated historical corpora are becoming available and corpus data is already widely used to answer various research questions, corpora are as yet rarely used in teaching. We believe they are ideally suited to make the variation in historical data transparent and help students to explore contexts and parameters. In our first study, we show how the KaJuK corpus and its more elaborated version, the GiesKaNe corpus, can be exploited to study adverbial sentences. Using the RIDGES corpus, the second study deals with phrasal and lexical development. Both studies focus on explaining the method and its extension to other corpora and research questions.Peer Reviewe

    Version 1.0

    Get PDF
    The present guidelines describe the annotation of narrative phenomena on the clause level, using a combination of ideas and methods from linguistics and lit- erary studies. The main categories marking the discourse strategy “narration” in stretches of text have been narrowed down to mediacy, i. e. involving a narrator, and sequentiality of events. This document specifies how to define mediacy, and in turn determine whether a narrator is present, as well as how to identify events and their sequential ordering. Lastly, a functional layer annotation is proposed which allows researchers to compare different types of narrative instances. This offers a basis for investigating a potential narrative register which is said to be important for many kinds of register studies.Peer Reviewe

    LAUDATIO-Repository: Accessing a heterogeneous field of linguistic corpora with the help of an open access repository

    Get PDF
    International audienceAn open access to digital historical research data for historical linguistics enables a fruitful exchange of research sources and research methods. To achieve this goal the LAUDATIO-Repository provides a long-term open access to historical corpus linguistic data. By developing the LAUDATIO-Repository we also want to explore how to build repositories that are useful for a set of well defined communities but are also flexible enough to be used and extended to serve other communities not considered beforehand. Considering the user community's needs requires a clear understanding of the community's user scenarios and research

    Falko. Eine Familie vielseitig annotierter Lernerkorpora des Deutschen als Fremdsprache

    Get PDF
    Falko ist ein frei zugĂ€ngliches Lernerkorpus des schriftsprachlichen Deutschen als Fremdsprache und umfasst nach jahrelanger Erschließung neuer Textressourcen und der Anreicherung mit diversen Annotationsebenen eine Reihe einzelner Korpora, die teilweise sehr komplex strukturiert sind. Im vorliegenden Beitrag stellen wir die komplexeste Datenressource aus der Reihe dieser Korpora vor – das Falko-Essay-Korpus, welches aktuell in einer neuen Version (3.0) erscheint und interessierten Forscherinnen und Forschern frei zur VerfĂŒgung steht

    Register: Language Users’ Knowledge of Situational-Functional Variation

    Get PDF
    The Collaborative Research Center 1412 “Register: Language Users’ Knowledge of Situational-Functional Variation” (CRC 1412) investigates the role of register in language, focusing in particular on what constitutes a language user’s register knowledge and which situational-functional factors determine a user’s choices. The following paper is an extract from the frame text of the proposal for the CRC 1412, which was submitted to the Deutsche Forschungsgemeinschaft in 2019, followed by a successful onsite evaluation that took place in 2019. The CRC 1412 then started its work on January 1, 2020. The theoretical part of the frame text gives an extensive overview of the theoretical and empirical perspectives on register knowledge from the viewpoint of 2019. Due to the high collaborative effort of all PIs involved, the frame text is unique in its scope on register research, encompassing register-relevant aspects from variationist approaches, psycholinguistics, grammatical theory, acquisition theory, historical linguistics, phonology, phonetics, typology, corpus linguistics, and computational linguistics, as well as qualitative and quantitative modeling. Although our positions and hypotheses since its submission have developed further, the frame text is still a vital resource as a compilation of state-of-the-art register research and a documentation of the start of the CRC 1412. The theoretical part without administrative components therefore presents an ideal starter publication to kick off the CRC’s publication series REALIS. For an overview of the projects and more information on the CRC, see https://sfb1412.hu-berlin.de/
    • 

    corecore