2,739 research outputs found
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
We propose a method for embedding two-dimensional locations in a continuous
vector space using a neural network-based model incorporating mixtures of
Gaussian distributions, presenting two model variants for text-based
geolocation and lexical dialectology. Evaluated over Twitter data, the proposed
model outperforms conventional regression-based geolocation and provides a
better estimate of uncertainty. We also show the effectiveness of the
representation for predicting words from location in lexical dialectology, and
evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP
2017) September 2017, Copenhagen, Denmar
The spoken Omani Arabic of âIbrÄ« : A âCrossing Pointâ in Gulf dialects
âIbrÄ« is located half-way in between Mascat and Dubai, and is very close to the Emirates border. This proximity facilitates young male citizens that look for job opportunities in the rich Emirates. Effectively, it is easy to find an occupation beyond the border: in Dubai, for the business sector; in Buraymi or Al-âAin for administration or health sector related professions (health sector for female nurses too); in various locations across the Emirates if serving as military or police staff (airport and border police includes female staff too).
âIbrÄ« speakers, the majority of whom come back home after work, have daily contacts with their Gulf neighbours. This style of life makes the speech of âIbrÄ« inhabitants critical for developing two levels of analysis:
1-features of the âIbrÄ« Spoken Arabic, in the general frame of Omani Arabic;
2-tracks of contamination among Gulf variants, due to both recent and historically motivated âcontacts and changes.â
Several pairs of variables must be taken into account: social, referring to badawiyy or áž„aážariyy; geographical, referring to the inner part of the country, or to west/east and north/south sides.
In principle, the area of âIbri should be âáž„aážariyy of the northâ. Nevertheless, we find elements that go beyond this classification. Phonology, for example, shows a series of combinatorial possibilities that hardly fit a schematic and annotated classification; then, we may also find the gahwah syndrome in occasional âIbri speeches.
According to what emerged from my collection of data in the city, I offer here a general morpho-phonological description of the local register. I also provide unpublished Omani texts, composed by teachers of âdialectâ, with examples of syntax and lexicon.
I intend to demonstrate how strong is the mismatching between political and linguistic borders in the Gulf area
Holistic corpus-based dialectology
This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain
Spatial evolution of human dialects
The geographical pattern of human dialects is a result of history. Here, we
formulate a simple spatial model of language change which shows that the final
result of this historical evolution may, to some extent, be predictable. The
model shows that the boundaries of language dialect regions are controlled by a
length minimizing effect analogous to surface tension, mediated by variations
in population density which can induce curvature, and by the shape of coastline
or similar borders. The predictability of dialect regions arises because these
effects will drive many complex, randomized early states toward one of a
smaller number of stable final configurations. The model is able to reproduce
observations and predictions of dialectologists. These include dialect
continua, isogloss bundling, fanning, the wave-like spread of dialect features
from cities, and the impact of human movement on the number of dialects that an
area can support. The model also provides an analytical form for S\'{e}guy's
Curve giving the relationship between geographical and linguistic distance, and
a generalisation of the curve to account for the presence of a population
centre. A simple modification allows us to analytically characterize the
variation of language use by age in an area undergoing linguistic change
Employing geographical principles for sampling in state of the art dialectological projects
The aims of this paper are twofold: First, we locate the most effective human geographical methods for sampling across space in large-scale dialectological projects. We propose two geographical concepts as a basis for sampling decisions: Geo-demographic classification, which is a multidimensional method used for the socio-economic grouping of areas. We also develop an updated version of functional regions that can be used in sociolinguistic research. We then report on the results of a pilot project that applies these models to collect data regarding the acceptability of vernacular morpho-syntactic forms in the North-East of England. Following the method of natural breaks advocated for dialectology by Horvath and Horvath (2002), we interpret breaks in the probabilistic patterns as areas of dialect transitions. This study contributes to the debate about the role and limitations of spatiality in linguistic analysis. It intends to broaden our knowledge about the interfaces between human geography and dialectology
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
- âŠ