162 research outputs found
Geo-temporal Twitter demographics
This paper seeks and uses highly disaggregate social media sources to characterize Greater London in terms of flows of people with modelled individual characteristics, as well as conventional measures of land use morphology and night-time residence. We conduct three analyses. First, we use the Shannon Entropy measure to characterize the geography of information creation across the city. Second, we create a geo-temporal demographic classification of Twitter users in London. Third, we begin to use Twitter data to characterize the links between different locations across the city. We see all three elements as data rich, highly disaggregate geo-temporal analysis of urban form and function, albeit one that pertains to no clearly defined population. Our conclusions reflect upon this severe shortcoming in analysis using social media data, and its implications for progressing our understanding of socio-spatial distributions within cities
An Individual Level Method for Improved Estimation of Ethnic Characteristics
This paper develops an improved method for estimating the ethnicity of individuals based on individual level pairings of given and family names. It builds upon previous research by using a global database of names from c. 1.7 billion living individuals, supplemented by individual level historical census data. In focusing upon Great Britain, these resources enable, respectively, greater precision in estimating probable global origins and better estimation of self-identification amongst long-established family groups such as the Irish Diaspora. We report on geographic issues in adjusting the weighting of groups that are systematically under- or over-predicted using other methods. Our individual level estimates are evaluated using both small area Great Britain census data for 2011 and individual level data for asylum seekers in Canada between 1995 and 2012. Our conclusions assess the value of such estimates in the conduct of social equity audits and in depicting the social mobility outcomes of residential mobility and migration across Great Britain
Delineating Europe\u27s Cultural Regions: Population Structure and Surname Clustering
Surnames (family names) show distinctive geographical patterning and in many disciplines remain an underutilized source of information about population origins, migration and identity. This paper investigates the geographical structure of surnames, using a unique individual level database assembled from registers and telephone directories from 16 European countries. We develop a novel combination of methods for exhaustively analyzing this multinational data set, based upon the Lasker Distance, consensus clustering and multidimensional scaling. Our analysis is both data rich and computationally intensive, entailing as it does the aggregation, clustering and mapping of 8 million surnames collected from 152 million individuals. The resulting regionalization has applications in developing our understanding of the social and cultural complexion of Europe, and offers potential insights into the long and short-term dynamics of migration and residential mobility. The research also contributes a range of methodological insights for future studies concerning spatial clustering of surnames and population data more widely. In short, this paper further demonstrates the value of surnames in multinational population studies and also the increasing sophistication of techniques available to analyze them
Names-based ethnicity enhancement of hospital admissions in England, 1999-2013.
BACKGROUND: Accurate recording of ethnicity in electronic healthcare records is important for the monitoring of health inequalities. Yet until the late 1990s, ethnicity information was absent from more than half of records of patients who received inpatient care in England. In this study, we report on the usefulness of a names-based ethnicity classification, Ethnicity Estimator (EE), for addressing this gap in the hospital records. MATERIALS AND METHODS: Data on inpatient hospital admissions were obtained from Hospital Episode Statistics (HES) between April 1999 and March 2014. The data were enhanced with ethnicity coding of participants' surnames using the EE software. Only data on the first episode for each patient each year were included. RESULTS: A total of 111,231,653 patient-years were recorded between April 1999 and March 2014. The completeness of ethnicity records improved from 59.5 % in 1999 to 90.5 % in 2013 (financial year). Biggest improvement was seen in the White British group, which increased from 55.4 % in 1999 to 73.9 % in 2013. The correct prediction of NHS-reported ethnicity varied by ethnic group (2013 figures): White British (89.8 %), Pakistani (81.7 %), Indian (74.6 %), Chinese (72.9 %), Bangladeshi (63.4 %), Black African (57.3 %), White Other (50.5 %), White Irish (45.0 %). For other ethnic groups the prediction success was low to none. Prediction success was above 70 % in most areas outside London but fell below 40 % in parts of London. CONCLUSION: Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity collected in the 1990s and 2000s. The prediction success of a names-based ethnicity classification tool has been quantified in HES for the first time and the results can be used to inform decisions around the optimal analysis of ethnic groups using this data source
Creating the 2011 area classification for output areas (2011 OAC)
This paper presents the methodology that has been used to create the 2011 Area Classification for Output Areas (2011 OAC). This extends a lineage of widely used public domain census only geodemographic classifications in the UK. It provides an update to the successful 2001 OAC methodology, and summarizes the social and physical structure of neighbourhoods using data from the 2011 UK Census. We also present the results of a user engagement exercise that underpinned the creation of an updated methodology for the 2011 OAC. The 2011 OAC comprises 8 Supergroups, 26 Groups and 76 Subgroups. Finally, we present an example of the results of the classification in Southampton
Ethnic inequalities in hospital admissions in England: an observational study.
BACKGROUND: Ethnic inequalities in health are well-known and partly explained by social determinants such as poorer living and working conditions, health behaviours, discrimination, social exclusion, and healthcare accessibility factors. Inequalities are known both for self-reported health and for diseases such as diabetes, cardiovascular diseases, respiratory diseases, and non-specific chest pains. Most studies however concern individual diseases or self-reported health and do not provide an overview that can detect gaps in existing knowledge. The aim of this study is thus to identify ethnic inequalities in inpatient hospital admission for all major disease categories in England. METHODS: Observational study of the inpatient hospital admission database in England enhanced with ethnicity coding of participants' surnames. The primary diagnosis was coded to Level 1 of the Global Burden of Disease groups. For each year, only the first admission for each condition for each participant was included. If a participant was readmitted within two days only the first admission was counted. Admission risk for all major disease groups for each ethnic group relative to the White British group were calculated using logistic regression adjusting for age and area deprivation. RESULTS: 40,928,105 admissions were identified between April 2009 and March 2014. Ethnic inequalities were found in cardiovascular diseases, respiratory diseases, chest pain, and diabetes in line with previous studies. Additional inequalities were found in nutritional deficiencies, endocrine disorders, and sense organ diseases. CONCLUSIONS: The results of this study were consistent with known inequalities, but also found previously unreported disparities in nutritional deficiencies, endocrine disorders, and sense organ diseases. Further studies would be required to map out the relevant care pathways for ethnic minorities and establish whether preventive measures can be strengthened
A Classification of Multidimensional Open Data for Urban Morphology
Identifying socio-spatial pa erns through geodemographic classification has provenutility over a range of disciplines. While most of these spatial classification systems include a plethora of socioeconomic attributes, there is arguably little to no input regarding attributes of the built environment or physical space, and their relationship to socioeconomic profiles within this context has not been evaluated in any systematic way. This research explores the generation of neighbourhood characteristics and other attributes using a geographic data science approach, taking advantage of the increasing availability of such spatial data from open data sources. We adopt a SOM (Self-Organizing Maps) methodology to create a classification of Multidimensional Open Data Urban Morphology (MODUM) and test the extent to which this output systematically follows conventional socioeconomic profiles. Such an analysis can also provide a simplified structure of the physical properties of geographic space that can be further used as input to more complex socioeconomic models
British surname origins, population structure and health outcomes-an observational study of hospital admissions.
Population structure is a confounder on pathways linking genotypes to health outcomes. This study examines whether the historical, geographical origins of British surnames are associated with health outcomes today. We coded hospital admissions of over 30 million patients in England between 1999 and 2013 to their British surname origin and divided their diagnoses into 125 major disease categories (of which 94 were complete-case). A base population was constructed with patients' first admission of any kind. Age- and sex-standardised odds ratios were calculated with logistic regression using patients with ubiquitous English surnames such as "Smith" as reference (alpha = .05; Benjamini-Hochberg false discovery rate (FDR) = .05). The results were scanned for "signals", where a branch of related surname origins all had significantly higher or lower risk. Age- and sex-standardised admission (alpha = .05) was calculated for each signal across area deprivation and surname origin density quintiles. Signals included three branches of English surnames (disorders of teeth and jaw, fractures, upper gastrointestinal disorders). Although the signal with fractures was considered unusual overall, 2 out of the 9 origins in the branch would only be significant at a FDR > .05: OR 0.92 (95% confidence interval 0.86-0.98) and 0.70 (0.55-0.90). The risk was only different in the quintile with the highest density of that group. Differential risk remained when studied across quintiles of area deprivation. The study shows that surname origins are associated with diverse health outcomes and thus act as markers of population structure over and above area deprivation
The Geography of the International System: The CShapes Dataset
We describe CShapes, a new dataset that provides historical maps of state boundaries and capitals in the post-World War II period. The dataset is coded according to both the Correlates of War and the Gleditsch and Ward (1999) state lists, and is therefore compatible with a great number of existing databases in the discipline. Provided in a geographic data format, CShapes can be used directly with standard GIS software, allowing a wide range of spatial computations. In addition, we supply a CShapes package for the R statistical toolkit. This package enables researchers without GIS skills to perform various useful operations on the GIS maps. The paper introduces the CShapes dataset and structure and gives three examples of how to use CShapes in political science research. First, we show how results from quantitative analysis can be depicted intuitively as a map. The second application gives an example of computing indicators on the CShapes maps, which can then be used in statistical tests. Third, we illustrate the use of CShapes for generating different weights matrices in spatial statistical applications. All the examples can be replicated using the freely available R package and do not require specialized GIS skills. The dataset is available for download from the CShapes website (http://nils.weidmann.ws/projects/cshapes). © Taylor & Francis Group, LLC
- …