187 research outputs found

    Evaluation of String Distance Algorithms for Dialectology

    Get PDF
    We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-grams---although we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances

    Afrikaans and Dutch as closely-related languages: A comparison to West Germanic languages and Dutch dialects

    Get PDF
    Following Den Besten‟s (2009) desiderata for historical linguistics of Afrikaans, this article aims to contribute some modern evidence to the debate regarding the founding dialects of Afrikaans. From an applied perspective (i.e. human language technology), we aim to determine which West Germanic language(s) and/or dialect(s)  would be best suited for the purposes of recycling speech resources for the benefit of developing speech  technologies for Afrikaans. Being recognised as a West Germanic language, Afrikaans is first compared to  Standard Dutch, Standard Frisian and Standard German. Pronunciation distances are measured by means of  Levenshtein distances. Afrikaans is found to be closest to Standard Dutch. Secondly, Afrikaans is compared to 361 Dutch dialectal varieties in the Netherlands and North-Belgium, using material from the Reeks  Nederlandse Dialectatlassen, a series of dialect atlases compiled by Blancquaert and Pée in the period  1925-1982 which cover the Dutch dialect area. Afrikaans is found to be closest to the South-Holland dialectal variety of Zoetermeer; this largely agrees with the findings of Kloeke (1950). No speech resources are  available for Zoetermeer, but such resources are available for Standard Dutch. Although the dialect of  Zoetermeer is significantly closer to Afrikaans than Standard Dutch is, Standard Dutch speech resources might be a good substitute.Keywords: human language technologies, speech resources, Afrikaans, Dutch, acoustic distanc

    The relationship between first language acquisition and dialect variation:Linking resources from distinct disciplines in a CLARIN-NL project

    Get PDF
    AbstractIt is remarkable that first language acquisition and historical dialectology should have remained strange bedfellows for so long considering the common assumption in historical linguistics that language change is due to the process of non-target transmission of linguistic features, forms and structures between generations, and thus between parents or adults and children. Both disciplines have remained isolated from each other due to, among other things, different research questions, methods of data-collection and types of empirical resources. The aim of this paper is to demonstrate that the common assumption in historical linguistics mentioned above can be examined with the help of Digital Humanities projects like CLARIN. CLARIN infrastructure makes it possible to carry out e-Humanities type research by combining datasets from distinct disciplines through tools for data processing. The outcome of the CLARIN-NL COAVA-project (acronym of: Cognition, Acquisition and Variation tool) allows researchers to access two datasets from two different sub disciplines simultaneously, namely Dutch first child language acquisition files located in Childes (MacWhinney, 2000) and historical Dutch Dialect Dictionaries through the development of a tool for easy exploration of nouns
    corecore