10 research outputs found

    Continuous variation in computational morphology - the example of Swiss German

    Get PDF
    International audienceMost work in natural language processing is geared towards written, standardized language varieties. This focus is generally justified on practical grounds of data availability and socio-economical relevance, but does not always reflect the linguistic reality of sub-standard varieties. In this paper, we aim at the computational description of the morphology of a language with continuous internal variation, as it is encountered in most dialect landscapes. The work presented here is applied to Swiss German dialects; these dialects are well documented through dialectological research and are among the most lively ones in Europe in terms of social acceptance and media exposure. Our work is inspired by previous research in generative dialectology and computational linguistics, which attempts to derive multiple dialect systems from a single reference system with the help of hand-written transformation rules. Such transformation rules may be called \textbf{georeferenced}, in the sense that they link to a set of geographic coordinates that can be grounded on a map. We improve on this work in several respects. First, our model associates all rules with probabilistic maps extracted from linguistic atlases. This allows us to handle transition zones in which several variants are accepted. Second, we provide a full implementation of this model on the basis of finite-state transducers. In addition to finite-state composition, which derives dialectal word forms by applying several rules in cascade, we propose a second type of composition, map composition, to compute the area of validity of the derived word forms on the basis of the probabilistic maps associated with the rules. In this paper, we will focus on two aspects of the proposed model: its theoretical value as a computationally effective description of continuous linguistic variation, and its practical value as a word-level machine translation system from Standard German into the various Swiss German dialects. We evaluate the model on the latter aspect

    Alemannische Wikipedia - eine Online-Enzyklopädie in alemannischen Dialekten

    Get PDF
    Die Alemannische Wikipedia ist derzeit der vermutlich umfangreichste Textkorpus in Alemannisch. Die Internet-Enzyklopädie ist in textlicher Nachahmung der dialektalen Aussprache verfasst. Sie bietet so für SprachwissenschaftlerInnen die Möglichkeit vielfältiger, analytischer Fragestellungen

    HeLI-based Experiments in Swiss German Dialect Identification

    Get PDF
    Peer reviewe

    ArchiMob : A multidialectal corpus of Swiss German spontaneous speech

    Get PDF
    Alemannische Dialektologie – Forschungsstand und Perspektiven. SonderheftPeer reviewe

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Digitising Swiss German : how to process and study a polycentric spoken language

    Get PDF
    Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety and that it is subject to considerable regional variation. This paper presents the ArchiMob corpus, a freely available general-purpose corpus of spoken Swiss German based on oral history interviews. The corpus is a result of a long design process, intensive manual work and specially adapted computational processing. We first present the modalities of access of the corpus for linguistic, historic and computational research. We then describe how the documents were transcribed, segmented and aligned with the sound source. This work involved a series of experiments that have led to automatically annotated normalisation and part-of-speech tagging layers. Finally, we present several case studies to motivate the use of the corpus for digital humanities in general and for dialectology in particular.Peer reviewe

    Word-Based Dialect Identification with Georeferenced Rules

    No full text
    We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character n-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data
    corecore