2,739 research outputs found

    Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

    Full text link
    We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP 2017) September 2017, Copenhagen, Denmar

    The spoken Omani Arabic of ‘Ibrī : A “Crossing Point” in Gulf dialects

    Get PDF
    ‘IbrÄ« is located half-way in between Mascat and Dubai, and is very close to the Emirates border. This proximity facilitates young male citizens that look for job opportunities in the rich Emirates. Effectively, it is easy to find an occupation beyond the border: in Dubai, for the business sector; in Buraymi or Al-‘Ain for administration or health sector related professions (health sector for female nurses too); in various locations across the Emirates if serving as military or police staff (airport and border police includes female staff too). ‘IbrÄ« speakers, the majority of whom come back home after work, have daily contacts with their Gulf neighbours. This style of life makes the speech of ‘IbrÄ« inhabitants critical for developing two levels of analysis: 1-features of the ‘IbrÄ« Spoken Arabic, in the general frame of Omani Arabic; 2-tracks of contamination among Gulf variants, due to both recent and historically motivated ‘contacts and changes.’ Several pairs of variables must be taken into account: social, referring to badawiyy or áž„aឍariyy; geographical, referring to the inner part of the country, or to west/east and north/south sides. In principle, the area of ‘Ibri should be “ងaឍariyy of the north”. Nevertheless, we find elements that go beyond this classification. Phonology, for example, shows a series of combinatorial possibilities that hardly fit a schematic and annotated classification; then, we may also find the gahwah syndrome in occasional ‘Ibri speeches. According to what emerged from my collection of data in the city, I offer here a general morpho-phonological description of the local register. I also provide unpublished Omani texts, composed by teachers of “dialect”, with examples of syntax and lexicon. I intend to demonstrate how strong is the mismatching between political and linguistic borders in the Gulf area

    Holistic corpus-based dialectology

    Get PDF
    This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain

    Spatial evolution of human dialects

    Get PDF
    The geographical pattern of human dialects is a result of history. Here, we formulate a simple spatial model of language change which shows that the final result of this historical evolution may, to some extent, be predictable. The model shows that the boundaries of language dialect regions are controlled by a length minimizing effect analogous to surface tension, mediated by variations in population density which can induce curvature, and by the shape of coastline or similar borders. The predictability of dialect regions arises because these effects will drive many complex, randomized early states toward one of a smaller number of stable final configurations. The model is able to reproduce observations and predictions of dialectologists. These include dialect continua, isogloss bundling, fanning, the wave-like spread of dialect features from cities, and the impact of human movement on the number of dialects that an area can support. The model also provides an analytical form for S\'{e}guy's Curve giving the relationship between geographical and linguistic distance, and a generalisation of the curve to account for the presence of a population centre. A simple modification allows us to analytically characterize the variation of language use by age in an area undergoing linguistic change

    Employing geographical principles for sampling in state of the art dialectological projects

    Get PDF
    The aims of this paper are twofold: First, we locate the most effective human geographical methods for sampling across space in large-scale dialectological projects. We propose two geographical concepts as a basis for sampling decisions: Geo-demographic classification, which is a multidimensional method used for the socio-economic grouping of areas. We also develop an updated version of functional regions that can be used in sociolinguistic research. We then report on the results of a pilot project that applies these models to collect data regarding the acceptability of vernacular morpho-syntactic forms in the North-East of England. Following the method of natural breaks advocated for dialectology by Horvath and Horvath (2002), we interpret breaks in the probabilistic patterns as areas of dialect transitions. This study contributes to the debate about the role and limitations of spatiality in linguistic analysis. It intends to broaden our knowledge about the interfaces between human geography and dialectology

    Variation and linguistic theory

    Get PDF

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Dialect contact and past BE in the English Fens

    Get PDF
    • 

    corecore