Search CORE

10,424 research outputs found

A Kernel Independence Test for Geographical Language Variation

Author: Eisenstein Jacob
Nguyen Dong
Publication venue: 'MIT Press - Journals'
Publication date: 29/08/2016
Field of study

Quantifying the degree of spatial dependence for linguistic variables is a key task for analyzing dialectal variation. However, existing approaches have important drawbacks. First, they are based on parametric models of dependence, which limits their power in cases where the underlying parametric assumptions are violated. Second, they are not applicable to all types of linguistic data: some approaches apply only to frequencies, others to boolean indicators of whether a linguistic variable is present. We present a new method for measuring geographical language variation, which solves both of these problems. Our approach builds on Reproducing Kernel Hilbert space (RKHS) representations for nonparametric statistics, and takes the form of a test statistic that is computed from pairs of individual geotagged observations without aggregation into predefined geographical bins. We compare this test with prior work using synthetic data as well as a diverse set of real datasets: a corpus of Dutch tweets, a Dutch syntactic atlas, and a dataset of letters to the editor in North American newspapers. Our proposed test is shown to support robust inferences across a broad range of scenarios and types of data.Comment: In submission. 26 page

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

Spatial evolution of human dialects

Author: Burridge James
Publication venue: 'American Physical Society (APS)'
Publication date: 01/07/2017
Field of study

The geographical pattern of human dialects is a result of history. Here, we formulate a simple spatial model of language change which shows that the final result of this historical evolution may, to some extent, be predictable. The model shows that the boundaries of language dialect regions are controlled by a length minimizing effect analogous to surface tension, mediated by variations in population density which can induce curvature, and by the shape of coastline or similar borders. The predictability of dialect regions arises because these effects will drive many complex, randomized early states toward one of a smaller number of stable final configurations. The model is able to reproduce observations and predictions of dialectologists. These include dialect continua, isogloss bundling, fanning, the wave-like spread of dialect features from cities, and the impact of human movement on the number of dialects that an area can support. The model also provides an analytical form for S\'{e}guy's Curve giving the relationship between geographical and linguistic distance, and a generalisation of the curve to account for the presence of a population centre. A simple modification allows us to analytically characterize the variation of language use by age in an area undergoing linguistic change

arXiv.org e-Print Archive

Directory of Open Access Journals

Portsmouth University Research Portal (Pure)

The Origins of Ethnolinguistic Diversity: Theory and Evidence

Author: Stelios Michalopoulos
Publication venue
Publication date
Field of study

This research examines theoretically and empirically the economic origins of ethnolinguistic diversity. The empirical analysis constructs detailed data on the distribution of land qualtiy and elevation across contiguous regions, virtual and real countries, and shows that variation in elevation and land quality has contributed significantly to the emergence and persistence of ethnic fractionalization. The empirical and historical evidence support the theoretical analysis, according to which heterogenous land endowments generated region specific human capital, liminting population mobility and leading to the formation of localized ethnicities and languages. The research contributes to the understanding of the emergence of ethnicities and languages. The research contributes to the understanding of the emergence of ethnicities and their spatial distribution and offers a distinction between the natural, georgraphically driven, versus the artificial, man-made, components of contemporary ethnic diversity.Ethnic Diversity, Geography, Technological Process, Human Capital, Colonization.

Research Papers in Economics

Why are there serial defaulters? Quasi-experimental evidence from Constitutions

Author: Emanuel Kohlscheen
Publication venue
Publication date
Field of study

Presidential democracies were 4.9 times more likely to default on external debts between 1976 and 2000 than parliamentary democracies. This paper argues that the explanation to the pattern of serial defaults among a number of sovereign borrowers lies in their constitutions (on serial defaults see Reinhart, Rogoff and Savastano (2003) and Reinhart and Rogoff (2004)). Ceteris paribus, parliamentary democracies are less likely to default on their liabilities as the confidence requirement creates a credible link between economic policies and the political survival of the executive. This link tends to strengthen the repayment commitment when politicians are opportunistic. I show that this effect is large and statistically significant in the contemporary world even when comparison is restricted to countries that are twins in terms of colonial origin, geography and economic variables. Moreover, the result persists if OECD democracies are excluded from the sample. Since the form of government of a country is typically chosen at the time of independence and highly persistent over time, constitutions can explain why debt policies in developing countries are related to individual histories.

Research Papers in Economics

Large-Scale Kernel Methods for Independence Testing

Author: Filippi Sarah
Gretton Arthur
Sejdinovic Dino
Zhang Qinyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/06/2016
Field of study

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.Comment: 29 pages, 6 figure

arXiv.org e-Print Archive

Springer - Publisher Connector

UCL Discovery

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

Author: Baldwin Timothy
Cohn Trevor
Rahimi Afshin
Publication venue
Publication date: 01/01/2017
Field of study

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP 2017) September 2017, Copenhagen, Denmar

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Spatial Point Pattern Analysis of the Unidentified Aerial Phenomena in France

Author: Laurent Thibault
Thomas-Agnan Christine
Vaillant Michaël
Publication venue
Publication date: 15/04/2015
Field of study

We model the unidentified aerial phenomena observed in France during the last 60 years as a spatial point pattern. We use some public information such as population density, rate of moisture or presence of airports to model the intensity of the unidentified aerial phenomena. Spatial exploratory data analysis is a first approach to appreciate the link between the intensity of the unidentified aerial phenomena and the covariates. We then fit an inhomogeneous spatial Poisson process model with covariates. We find that the significant variables are the population density, the presence of the factories with a nuclear risk and contaminated land, and the rate of moisture. The analysis of the residuals shows that some parts of France (the Belgian border, the tip of Britany, some parts in the SouthEast , the Picardie and Haute-Normandie regions, the Loiret and Corr eze departments) present a high value of local intensity which are not explained by our model

arXiv.org e-Print Archive

A neural marker for social bias toward in-group accents

Author: Belin Pascal
Bestelmeyer Patricia E.G.
Ladd D. Robert
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/12/2014
Field of study

Accents provide information about the speaker's geographical, socio-economic, and ethnic background. Research in applied psychology and sociolinguistics suggests that we generally prefer our own accent to other varieties of our native language and attribute more positive traits to it. Despite the widespread influence of accents on social interactions, educational and work settings the neural underpinnings of this social bias toward our own accent and, what may drive this bias, are unexplored. We measured brain activity while participants from two different geographical backgrounds listened passively to 3 English accent types embedded in an adaptation design. Cerebral activity in several regions, including bilateral amygdalae, revealed a significant interaction between the participants' own accent and the accent they listened to: while repetition of own accents elicited an enhanced neural response, repetition of the other group's accent resulted in reduced responses classically associated with adaptation. Our findings suggest that increased social relevance of, or greater emotional sensitivity to in-group accents, may underlie the own-accent bias. Our results provide a neural marker for the bias associated with accents, and show, for the first time, that the neural response to speech is partly shaped by the geographical background of the listener

Education and conflict recovery : the case of Timor Leste

Author: Justino Patricia
Leone Marinella
Salardi Paola
Publication venue
Publication date
Field of study

The Timor Leste secession conflict lasted for 25 years. Its last wave of violence in 1999, following the withdrawal of Indonesian troops, generated massive displacement and destruction with widespread consequences for the economic and social development of the country. This paper analyzes the impact of the conflict on the level and access to education of boys and girls in Timor Leste. The authors examine the short-term impact of the 1999 violence on school attendance and grade deficit rates in 2001, and the longer-term impact of the conflict on primary school completion of cohorts of children observed in 2007. They compare the educational impact of the 1999 wave of violence with the impact of other periods of high-intensity violence during the 25 years of Indonesian occupation. The short-term effects of the conflict are mixed. In the longer term, the analysis finds a strong negative impact of the conflict on primary school completion among boys of school age exposed to peaks of violence during the 25-year long conflict. The effect is stronger for boys attending the last three grades of primary school. This result shows a substantial loss of human capital among young males in Timor Leste since the early 1970s, resulting from household investment trade-offs between education and economic survival.Adolescent Health,Youth and Governance,Education For All,Primary Education,Post Conflict Reconstruction

Research Papers in Economics