Search CORE

3,363 research outputs found

Modeling Global Syntactic Variation in English Using Dialect Classification

Author: Dunn Jonathan
Publication venue
Publication date: 11/04/2019
Field of study

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

arXiv.org e-Print Archive

UC Research Repository

Classifying Dutch dialects using a syntactic measure:The perceptual Daan and Blok dialect map revisited

Author: Spruit M.R.
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

International Migration, Integration and Social Cohesion online publications

Computational Sociolinguistics: A Survey

Author: de Jong Franciska
Doğruöz A. Seza
Nguyen Dong
Rosé Carolyn P.
Publication venue
Publication date: 01/01/2016
Field of study

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

EUR Research Repository

University of Twente Research Information

Clearing the transcription hurdle in dialect corpus building : the corpus of Southern Dutch dialects as case-study

Author: Breitbarth Anne
Farasyn Melissa
Ghyselen Anne-Sophie
van Hessen Arjan
Van Keymeulen Jacques
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

This paper discusses how the transcription hurdle in dialect corpus building can be cleared. While corpus analysis has strongly gained in popularity in linguistic research, dialect corpora are still relatively scarce. This scarcity can be attributed to several factors, one of which is the challenging nature of transcribing dialects, given a lack of both orthographic norms for many dialects and speech technological tools trained on dialect data. This paper addresses the questions (i) how dialects can be transcribed efficiently and (ii) whether speech technological tools can lighten the transcription work. These questions are tackled using the Southern Dutch dialects (SDDs) as case study, for which the usefulness of automatic speech recognition (ASR), respeaking, and forced alignment is considered. Tests with these tools indicate that dialects still constitute a major speech technological challenge. In the case of the SDDs, the decision was made to use speech technology only for the word-level segmentation of the audio files, as the transcription itself could not be sped up by ASR tools. The discussion does however indicate that the usefulness of ASR and other related tools for a dialect corpus project is strongly determined by the sound quality of the dialect recordings, the availability of statistical dialect-specific models, the degree of linguistic differentiation between the dialects and the standard language, and the goals the transcripts have to serve

Ghent University Academic Bibliography

Clearing the Transcription Hurdle in Dialect Corpus Building:The Corpus of Southern Dutch Dialects as Case Study

Author: Breitbarth Anne
Farasyn Melissa
Ghyselen Anne Sophie
van Hessen Arjan
Van Keymeulen Jacques
Publication venue
Publication date: 15/04/2020
Field of study

University of Twente Research Information

Low Saxon dialect distances at the orthographic and syntactic level

Author: Scherrer Yves
Siewert Janine
Wieling Martijn
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

We compare five Low Saxon dialects from the 19th and 21st century from Germany and the Netherlands with each other as well as with modern Standard Dutch and Standard German. Our comparison is based on character n-grams on the one hand and PoS n-grams on the other and we show that these two lead to different distances. Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Mutual Intelligibility

Author: Gooskens Charlotte
van Heuven Vincent
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2021
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Complexity as L2-difficulty : implications for syntactic change

Author: Breitbarth Anne
Walkden George
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

Recent work has cast doubt on the idea that all languages are equally complex; however, the notion of syntactic complexity remains underexplored. Taking complexity to equate to difficulty of acquisition for late L2 acquirers, we propose an operationalization of syntactic complexity in terms of uninterpretable features. Trudgill's sociolinguistic typology predicts that sociohistorical situations involving substantial late L2 acquisition should be conducive to simplification, i.e. loss of such features. We sketch a programme for investigating this prediction. In particular, we suggest that the loss of bipartite negation in the history of Low German and other languages indicates that it may be on the right track

KOPS - The Institutional Repository of the University of Konstanz

Ghent University Academic Bibliography

Tools for dialect syntax: the case of CORDIAL-SIN (an annotated corpus of Portuguese dialects)

Author: Carrilho Ernestina
Publication venue: Universidad del Pais Vasco
Publication date: 01/01/2010
Field of study

This paper addresses methodological issues of concern to the study of morphosyntactic variation. While the empirical basis of dialect syntax is still a matter of elaboration, the focus will be here on the role of dialect corpora as tools for the study of linguistic variation in this particular domain. The case of CORDIAL-SIN, an annotated corpus of Portuguese dialects, will be presented along with some initial advances in Portuguese dialect syntax. Two levels of tools for the study of linguistic variation will thus be addressed here: (i) corpora as general tools for dialect syntax; and (ii) tagging and syntactic annotation within a dialect corpus as tools that ease the way how variation in morphosyntax can be studied. Section 1 introduces methodological remarks concerning the empirical ground for dialect syntax; the CORDIAL-SIN is presented in section 2; section 3 briefly illustrates how this tool has enhanced the development of Portuguese dialect syntax.info:eu-repo/semantics/publishedVersio

Universidad del País Vasco / Euskal Herriko Unibertsitatea: Ciencia - Portal de revistas digitales de la UPV/EHU

Universidade de Lisboa: Repositório.UL

Low Saxon dialect distances at the orthographic and syntactic level

Author: Scherrer Yves
Siewert Janine
Wieling Martijn
Publication venue: Association for Computational Linguistics (ACL)
Publication date: 01/01/2022
Field of study

ARTS repository - University of Groningen