2,192 research outputs found

    The cultural, ethnic and linguistic classification of populations and neighbourhoods using personal names

    Get PDF
    There are growing needs to understand the nature and detailed composition of ethnicgroups in today?s increasingly multicultural societies. Ethnicity classifications areoften hotly contested, but still greater problems arise from the quality and availabilityof classifications, with knock on consequences for our ability meaningfully tosubdivide populations. Name analysis and classification has been proposed as oneefficient method of achieving such subdivisions in the absence of ethnicity data, andmay be especially pertinent to public health and demographic applications. However,previous approaches to name analysis have been designed to identify one or a smallnumber of ethnic minorities, and not complete populations.This working paper presents a new methodology to classify the UK population andneighbourhoods into groups of common origin using surnames and forenames. Itproposes a new ontology of ethnicity that combines some of its multidimensionalfacets; language, religion, geographical region, and culture. It uses data collected atvery fine temporal and spatial scales, and made available, subject to safeguards, at thelevel of the individual. Such individuals are classified into 185 independentlyassigned categories of Cultural Ethnic and Linguistic (CEL) groups, based on theprobable origins of names. We include a justification for the need of classifyingethnicity, a proposed CEL taxonomy, a description of how the CEL classification wasbuilt and applied, a preliminary external validation, and some examples of current andpotential applications

    Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems

    Get PDF
    Crowdsourcing systems commonly face the problem of aggregating multiple judgments provided by potentially unreliable workers. In addition, several aspects of the design of efficient crowdsourcing processes, such as defining worker's bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration of the task at hand. Bringing this together, in this work we introduce a new time--sensitive Bayesian aggregation method that simultaneously estimates a task's duration and obtains reliable aggregations of crowdsourced judgments. Our method, called BCCTime, builds on the key insight that the time taken by a worker to perform a task is an important indicator of the likely quality of the produced judgment. To capture this, BCCTime uses latent variables to represent the uncertainty about the workers' completion time, the tasks' duration and the workers' accuracy. To relate the quality of a judgment to the time a worker spends on a task, our model assumes that each task is completed within a latent time window within which all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers) are expected to submit their judgments. In contrast, workers with a lower propensity to valid labeling, such as spammers, bots or lazy labelers, are assumed to perform tasks considerably faster or slower than the time required by normal workers. Specifically, we use efficient message-passing Bayesian inference to learn approximate posterior probabilities of (i) the confusion matrix of each worker, (ii) the propensity to valid labeling of each worker, (iii) the unbiased duration of each task and (iv) the true label of each task. Using two real-world public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a task's duration compared to state-of-the-art methods

    Party competition and voter decision-making

    Get PDF
    This paper combines two important findings from research on how voters and parties interact: Firstly, it acknowledges that voters possess different decision making mechanisms: Instead of weighing the parties' policy promises, they might also vote based on past performance or the personal qualities of party leaders. Secondly, it incorporates empirical findings challenging a prominent device by which party-voter linkages are modeled, i.e. the left-right scheme. Modern party systems have been shown to vary in the number of dimensions parties compete on. We model how voters aggregate issues into party rankings, assuming that voters switch decision making mechanisms contingent on their heuristic value, and develop hypotheses on how issue diversity in party competition influences voter heuristic use

    "Euclidean Revealed Preferences: Testing the Spatial Voting Model"

    Get PDF
    In the spatial model of voting, voters choose the candidate closest to them in the ideological space. Recent work by (Degan and Merlo 2009) shows that it is falsifiable on the basis of individual voting data in multiple elections. We show how to tackle the fact that the model only partially identifies the distribution of voting profiles and we give a formal revealed preference test of the spatial voting model in 3 national elections in the US, and strongly reject the spatial model in all cases. We also construct confidence regions for partially identified voter characteristics in an augmented model with unobserved valence dimension, and identify the amount of voter heterogeneity necessary to reconcile the data with spatial preferences.
    corecore