1,373 research outputs found

    A quantitative approach to social and geographical dialect variation

    Get PDF

    Linguistic probes into human history

    Get PDF

    Linguistic probes into human history

    Get PDF
    Dit proefschrift omvat vijf reeds gepubliceerde artikelen en een studie die binnenkort verschijnt. Daarin heb ik taalkundige methoden onderzocht, getoetst en gebruikt om linguïstische variëteiten te classificeren op basis van steekproeven die bestaan uit lexicale items.De gerapporteerde studies hebben betrekking op de classificatie van Nederlandse variëteiten uit Nederland, talen en dialecten uit Spanje, Bantu-variëteiten uit Gabon, Tanzania en tenslotte Turkse en Indo-Iraanse talen die gesproken worden in Kirgizstan, Tadzjikistan en Oezbekistan.Binnen een multidisciplinair perspectief dat gericht is op het verschaffen van een hoger niveau van antropologische synthese wordt de taalkundige diversiteit gebruikt als proxy voor de culturele verschillen van de overeenkomstige populaties en wordt vervolgens vergeleken met de variabiliteit van familienamen (hun aantal, frequentie en geografische verdeling) of met genetische verschillen die gebaseerd zijn op moleculaire kenmerken in het DNA.Met betrekking tot dat laatste kan de analyse van familienamen migraties zichtbaar maken die mogelijk in historische tijden hebben plaatsgevonden, en kunnen we regio's onderscheiden die veel immigranten hebben ontvangen die wegtrokken uit demografisch stabieler gebleven regio's. Wij vermoeden dat dergelijke migratiepatronen dialect- en taalcontact hebben beïnvloed. Dit is een nieuw perspectief van waaruit we de effecten van migratie op taalverandering kunnen onderzoeken.This thesis in linguistics includes five published articles and one study to appear, in which I review, test and use computational linguistic methods to classify languages and dialects consisting of lexical items – the sort of material that is generally readily available from linguistic atlases and databases. To compare linguistic varieties and classify them, methods that lead to the computation of a linguistic distance matrix are used.The studies reported respectively concern the classification of Dutch dialects from the Netherlands; languages and dialects from Spain; Bantu languages from Gabon, Tanzania and, finally, Turkic and Indo-Iranian languages spoken in Kyrgyzstan, Tajikistan and Uzbekistan.In a multidisciplinary perspective aimed at providing a higher level of anthropological synthesis, linguistic diversity is used as a proxy for the cultural differences of corresponding populations and is then compared to the variability of family names (their number, frequency and geographic distribution) or to genetic differences based on molecular markers on the DNA. The analysis of family names enables the depiction of migrations which have taken place in historical times, and, allows us to distinguish regions that have received many immigrants from those that have remained demographically more stable. We conjecture that such migration patterns have influenced dialect and language contact. This is a novel perspective from which we may examine the effects of migration on language change, for example it appears that Spanish languages have remained lively because the regions where they are spoken have often be quite isolated demographically

    Using Gabmap

    Get PDF
    AbstractGabmap is a freely available, open-source web application that analyzes the data of language variation, e.g. varying words for the same concepts, varying pronunciations for the same words, or varying frequencies of syntactic constructions in transcribed conversations. Gabmap is an integrated part of CLARIN (see e.g. http://portal.clarin.nl). This article summarizes Gabmap's basic functionality, adding material on some new features and reporting on the range of uses to which Gabmap has been put. Gabmap is modestly successful, and its popularity underscores the fact that the study of language variation has crossed a watershed concerning the acceptability of automated language analysis. Automated analysis not only improves researchers’ efficiency, it also improves the replicability of their analyses and allows them to focus on inferences to be drawn from analyses and other more abstract aspects of that study

    Neural representations for modeling variation in speech

    Get PDF
    Variation in speech is often quantified by comparing phonetic transcriptions of the same utterance. However, manually transcribing speech is time-consuming and error prone. As an alternative, therefore, we investigate the extraction of acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and between Norwegian dialect speakers. For comparison with several earlier studies, we evaluate how well these differences match human perception by comparing them with available human judgements of similarity. We show that speech representations extracted from a specific type of neural model (i.e. Transformers) lead to a better match with human perception than two earlier approaches on the basis of phonetic transcriptions and MFCC-based acoustic features. We furthermore find that features from the neural models can generally best be extracted from one of the middle hidden layers than from the final layer. We also demonstrate that neural speech representations not only capture segmental differences, but also intonational and durational differences that cannot adequately be represented by a set of discrete symbols used in phonetic transcriptions.Comment: Submitted to Journal of Phonetic
    corecore