10,424 research outputs found
A Kernel Independence Test for Geographical Language Variation
Quantifying the degree of spatial dependence for linguistic variables is a
key task for analyzing dialectal variation. However, existing approaches have
important drawbacks. First, they are based on parametric models of dependence,
which limits their power in cases where the underlying parametric assumptions
are violated. Second, they are not applicable to all types of linguistic data:
some approaches apply only to frequencies, others to boolean indicators of
whether a linguistic variable is present. We present a new method for measuring
geographical language variation, which solves both of these problems. Our
approach builds on Reproducing Kernel Hilbert space (RKHS) representations for
nonparametric statistics, and takes the form of a test statistic that is
computed from pairs of individual geotagged observations without aggregation
into predefined geographical bins. We compare this test with prior work using
synthetic data as well as a diverse set of real datasets: a corpus of Dutch
tweets, a Dutch syntactic atlas, and a dataset of letters to the editor in
North American newspapers. Our proposed test is shown to support robust
inferences across a broad range of scenarios and types of data.Comment: In submission. 26 page
Spatial evolution of human dialects
The geographical pattern of human dialects is a result of history. Here, we
formulate a simple spatial model of language change which shows that the final
result of this historical evolution may, to some extent, be predictable. The
model shows that the boundaries of language dialect regions are controlled by a
length minimizing effect analogous to surface tension, mediated by variations
in population density which can induce curvature, and by the shape of coastline
or similar borders. The predictability of dialect regions arises because these
effects will drive many complex, randomized early states toward one of a
smaller number of stable final configurations. The model is able to reproduce
observations and predictions of dialectologists. These include dialect
continua, isogloss bundling, fanning, the wave-like spread of dialect features
from cities, and the impact of human movement on the number of dialects that an
area can support. The model also provides an analytical form for S\'{e}guy's
Curve giving the relationship between geographical and linguistic distance, and
a generalisation of the curve to account for the presence of a population
centre. A simple modification allows us to analytically characterize the
variation of language use by age in an area undergoing linguistic change
The Origins of Ethnolinguistic Diversity: Theory and Evidence
This research examines theoretically and empirically the economic origins of ethnolinguistic diversity. The empirical analysis constructs detailed data on the distribution of land qualtiy and elevation across contiguous regions, virtual and real countries, and shows that variation in elevation and land quality has contributed significantly to the emergence and persistence of ethnic fractionalization. The empirical and historical evidence support the theoretical analysis, according to which heterogenous land endowments generated region specific human capital, liminting population mobility and leading to the formation of localized ethnicities and languages. The research contributes to the understanding of the emergence of ethnicities and languages. The research contributes to the understanding of the emergence of ethnicities and their spatial distribution and offers a distinction between the natural, georgraphically driven, versus the artificial, man-made, components of contemporary ethnic diversity.Ethnic Diversity, Geography, Technological Process, Human Capital, Colonization.
Why are there serial defaulters? Quasi-experimental evidence from Constitutions
Presidential democracies were 4.9 times more likely to default on external debts between 1976 and 2000 than parliamentary democracies. This paper argues that the explanation to the pattern of serial defaults among a number of sovereign borrowers lies in their constitutions (on serial defaults see Reinhart, Rogoff and Savastano (2003) and Reinhart and Rogoff (2004)). Ceteris paribus, parliamentary democracies are less likely to default on their liabilities as the confidence requirement creates a credible link between economic policies and the political survival of the executive. This link tends to strengthen the repayment commitment when politicians are opportunistic. I show that this effect is large and statistically significant in the contemporary world even when comparison is restricted to countries that are twins in terms of colonial origin, geography and economic variables. Moreover, the result persists if OECD democracies are excluded from the sample. Since the form of government of a country is typically chosen at the time of independence and highly persistent over time, constitutions can explain why debt policies in developing countries are related to individual histories.
Large-Scale Kernel Methods for Independence Testing
Representations of probability measures in reproducing kernel Hilbert spaces
provide a flexible framework for fully nonparametric hypothesis tests of
independence, which can capture any type of departure from independence,
including nonlinear associations and multivariate interactions. However, these
approaches come with an at least quadratic computational cost in the number of
observations, which can be prohibitive in many applications. Arguably, it is
exactly in such large-scale datasets that capturing any type of dependence is
of interest, so striking a favourable tradeoff between computational efficiency
and test performance for kernel independence tests would have a direct impact
on their applicability in practice. In this contribution, we provide an
extensive study of the use of large-scale kernel approximations in the context
of independence testing, contrasting block-based, Nystrom and random Fourier
feature approaches. Through a variety of synthetic data experiments, it is
demonstrated that our novel large scale methods give comparable performance
with existing methods whilst using significantly less computation time and
memory.Comment: 29 pages, 6 figure
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
We propose a method for embedding two-dimensional locations in a continuous
vector space using a neural network-based model incorporating mixtures of
Gaussian distributions, presenting two model variants for text-based
geolocation and lexical dialectology. Evaluated over Twitter data, the proposed
model outperforms conventional regression-based geolocation and provides a
better estimate of uncertainty. We also show the effectiveness of the
representation for predicting words from location in lexical dialectology, and
evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP
2017) September 2017, Copenhagen, Denmar
Spatial Point Pattern Analysis of the Unidentified Aerial Phenomena in France
We model the unidentified aerial phenomena observed in France during the last
60 years as a spatial point pattern. We use some public information such as
population density, rate of moisture or presence of airports to model the
intensity of the unidentified aerial phenomena. Spatial exploratory data
analysis is a first approach to appreciate the link between the intensity of
the unidentified aerial phenomena and the covariates. We then fit an
inhomogeneous spatial Poisson process model with covariates. We find that the
significant variables are the population density, the presence of the factories
with a nuclear risk and contaminated land, and the rate of moisture. The
analysis of the residuals shows that some parts of France (the Belgian border,
the tip of Britany, some parts in the SouthEast , the Picardie and
Haute-Normandie regions, the Loiret and Corr eze departments) present a high
value of local intensity which are not explained by our model
A neural marker for social bias toward in-group accents
Accents provide information about the speaker's geographical, socio-economic, and ethnic background. Research in applied psychology and sociolinguistics suggests that we generally prefer our own accent to other varieties of our native language and attribute more positive traits to it. Despite the widespread influence of accents on social interactions, educational and work settings the neural underpinnings of this social bias toward our own accent and, what may drive this bias, are unexplored. We measured brain activity while participants from two different geographical backgrounds listened passively to 3 English accent types embedded in an adaptation design. Cerebral activity in several regions, including bilateral amygdalae, revealed a significant interaction between the participants' own accent and the accent they listened to: while repetition of own accents elicited an enhanced neural response, repetition of the other group's accent resulted in reduced responses classically associated with adaptation. Our findings suggest that increased social relevance of, or greater emotional sensitivity to in-group accents, may underlie the own-accent bias. Our results provide a neural marker for the bias associated with accents, and show, for the first time, that the neural response to speech is partly shaped by the geographical background of the listener
Education and conflict recovery : the case of Timor Leste
The Timor Leste secession conflict lasted for 25 years. Its last wave of violence in 1999, following the withdrawal of Indonesian troops, generated massive displacement and destruction with widespread consequences for the economic and social development of the country. This paper analyzes the impact of the conflict on the level and access to education of boys and girls in Timor Leste. The authors examine the short-term impact of the 1999 violence on school attendance and grade deficit rates in 2001, and the longer-term impact of the conflict on primary school completion of cohorts of children observed in 2007. They compare the educational impact of the 1999 wave of violence with the impact of other periods of high-intensity violence during the 25 years of Indonesian occupation. The short-term effects of the conflict are mixed. In the longer term, the analysis finds a strong negative impact of the conflict on primary school completion among boys of school age exposed to peaks of violence during the 25-year long conflict. The effect is stronger for boys attending the last three grades of primary school. This result shows a substantial loss of human capital among young males in Timor Leste since the early 1970s, resulting from household investment trade-offs between education and economic survival.Adolescent Health,Youth and Governance,Education For All,Primary Education,Post Conflict Reconstruction
- …