3,871 research outputs found

    Deriving Supply-side Variables to Extend Geodemographic Classification

    Get PDF
    The traditional proprietary geodemographic information systems that are on the market today use well-established methodologies. Demographic indicators are selected as a proxy for affluence and are then often linked to customer databases to derive a measure of the level of consumption expected from the different area typologies. However, these systems ignore fundamental relationships in the retail market by focusing upon demand characteristics in a ‘vacuum’ and ignore the supply side and consumer-supplier interaction. This paper argues that there may be considerable advantages to including supply-side indicators within geodemographic systems. Whilst the term ‘supply’ in this context might imply the number of consumer services already in an area, equally important for understanding demand are variables such as the supply of jobs and houses. We suggest that profiling an area in terms of its labour market characteristics gives a better insight into the income chain while the supply of houses could be argued to be a crucial factor in household formation that in turn will impact upon demographic structure. Using the regional example of Yorkshire and Humberside in northern England, we indicate how a suite of supply-side variables relating to the labour market can be assembled and used alongside a suite of demand variables to generate a new area classification. Spatial interaction models are calibrated to derive some of the variables that take into account zonal self-containment and catchment size

    On the spatio-temporal analysis of hydrological droughts from global hydrological models

    Get PDF
    The recent concerns for world-wide extreme events related to climate change have motivated the development of large scale models that simulate the global water cycle. In this context, analysis of hydrological extremes is important and requires the adaptation of identification methods used for river basin models. This paper presents two methodologies that extend the tools to analyze spatio-temporal drought development and characteristics using large scale gridded time series of hydrometeorological data. The methodologies are classified as non-contiguous and contiguous drought area analyses (i.e. NCDA and CDA). The NCDA presents time series of percentages of areas in drought at the global scale and for pre-defined regions of known hydroclimatology. The CDA is introduced as a complementary method that generates information on the spatial coherence of drought events at the global scale. Spatial drought events are found through CDA by clustering patterns (contiguous areas). In this study the global hydrological model WaterGAP was used to illustrate the methodology development. Global gridded time series of subsurface runoff (resolution 0.5°) simulated with the WaterGAP model from land points were used. The NCDA and CDA were developed to identify drought events in runoff. The percentages of area in drought calculated with both methods show complementary information on the spatial and temporal events for the last decades of the 20th century. The NCDA provides relevant information on the average number of droughts, duration and severity (deficit volume) for pre-defined regions (globe, 2 selected hydroclimatic regions). Additionally, the CDA provides information on the number of spatially linked areas in drought, maximum spatial event and their geographic location on the globe. Some results capture the overall spatio-temporal drought extremes over the last decades of the 20th century. Events like the El Niño Southern Oscillation (ENSO) in South America and the pan-European drought in 1976 appeared clearly in both analyses. The methodologies introduced provide an important basis for the global characterization of droughts, model inter-comparison of drought identified from global hydrological models and spatial event analyse

    Socio-spatial inequalities in late-stage cancer diagnosis in Illinois: spatiotemporal trends and methodological challenges

    Get PDF
    This dissertation examines the effects of social and spatial inequalities on late-stage diagnosis of colorectal and breast cancer, and it addresses several methodological challenges surrounding the use of ZIP codes as a study unit in analyzing late-stage cancer at diagnosis. Given that my dissertation follows the ???three-paper??? format, the abstract section is divided into three parts to describe each paper respectively. The first paper entitled ???Spatial Distribution of Late-Stage Colorectal Cancer in Illinois from 1988 to 2002: Associations with Social-Spatial Covariates???, examines spatial patterns of late-stage colorectal cancer diagnosis over time in Illinois during a period of increasing screening, and it analyzes the varying associations between social, demographic and spatial risk factors and late-stage colorectal cancer diagnosis within the same period. The Bernoulli-based spatial scan statistic was used to detect clusters of late-to-early stage cancer ratios at the ZIP code level in Illinois during two periods: 1988 to 1992, and 1998 to 2002. Then the whole state was divided into three study region: Chicago city, Chicago suburbs, and other areas. For each region in each time period, hierarchical logistic regression models were estimated to assess the associations between demographic, social and spatial factors and late-stage colorectal cancer risk. ZIP code level risk factors include three indicators of socio-economic status and the shortest travel time to the nearest colonoscopy facility and individual-level factors including age, race, and gender. The socio-economic indicators were created using factor analysis. The results show some changes over time in the spatial distribution of late-stage colorectal cancer and the impacts of risk factors at the ZIP code and individual levels. Specifically, results of the Bernoulli-based spatial scan statistic find statistically significant clusters of late-stage colorectal cancer in the Chicago metropolitan area and rural region in southern Illinois in the period of 1988 to 1992. In the later time period, the cluster outcomes were no longer statistically significant. The change indicates that late-stage risk of colorectal cancer has become more evenly distributed in Illinois over time. In terms of the hierarchical logistic regression results, both individual-level demographic factors and zip-code level covariates present variously important impacts on the risk of the late-stage colorectal cancer diagnosis in different study regions in the two time periods. The risk of late-stage diagnosis is higher among younger colorectal cancer patients. Gender has contradictory impacts on risk in Chicago city and its suburb. The shortest travel time to the nearest cancer screening providers is positively associated with late-stage diagnosis risk outside the Chicago region, suggesting that spatial access to screening services may be an important barrier to early detection in rural areas of the state. One socio-economic status indicator, Minority Disparities, demonstrated a significantly positive relationship with late-stage diagnosis risk outside the Chicago region. Similar to the effects of gender, Factor 3 (Cultural-Language Barriers) also had contradictory effects in Chicago city and suburbs. Overall, the results showed no clear trends over time in the effects of various factors on late-stage risk, and few strong and statistically significant results. The inconsistent findings suggest the need for more detailed and localized information. The second paper is titled ???Analyzing Spatial Aggregation Error in Statistical Models of Late-Stage Cancer Risk: A Monte Carlo Simulation Approach???. This paper examines the effect of spatial aggregation error on statistical estimates of the association between spatial access to health care and late-stage cancer. Monte Carlo simulation was used to disaggregate breast cancer cases for two Illinois counties from ZIP codes to census blocks in proportion to the age-race composition of the block population. After the disaggregation, a hierarchical logistic model was estimated examining the relationship between late-stage breast cancer and risk factors including travel distance to mammography, at both the ZIP code and census block levels. Model coefficients were compared between the two levels to assess the impact of spatial aggregation error. Spatial aggregation error is found to influence the coefficients of regression-type models at the ZIP code level, and this impact is highly dependent on the study area. In one study area (Kane County), block-level coefficients were very similar to those estimated on the basis of ZIP code data; whereas in the other study area (Peoria County), the two sets of coefficients differed substantially raising the possibility of drawing inaccurate inferences about the association between distance to mammography and late-stage cancer risk. The paper reveals that spatial aggregation error can significantly affect the coefficient values in statistical models of the association between cancer outcomes and spatial and non-spatial variables and thus affect inferences drawn from these models. Relying on data at the ZIP code level may lead to inaccurate findings on health risk factors, and the effects are likely to vary from one study area to another. The third paper, titled ???The Impact of Spatial Aggregation Error on Spatial Scan Analysis: A Case Study of Colorectal Cancer,??? aims to examine the effect of spatial aggregation error on results of the spatial scan statistic by geographically and statistically comparing results at the ZIP code level and three reference (census tract, census block group and census block) levels. Data on colorectal cancer cases in Cook County, IL for a 5-year interval (1998-2002) were used. The Monte Carlo simulation approach from the second paper was applied to disaggregate the cancer data from the ZIP code level to each reference level. The Bernoulli-based spatial scan statistic was implemented in SaTScan to detect primary clusters based on cancer data at the four levels. An interactive procedure involving SAS and Java programming, was designed to automatically run SaTScan hundreds of times. Characteristics of clusters at each reference level were compared to those of the ZIP code level cluster to observe differences related to spatial aggregation. The comparison reveals that the ZIP code level spatial scan statistic can generate reliable clusters at the global level in areas with a large number of cases. Nonetheless, the ZIP code analysis sometimes fails to detect clusters in areas with a lower density of cases. Spatial aggregation error is minimized in areas with sizeable numbers of cases. In the absence of cancer data at a lower level, the ZIP code level data can be used effectively to implement the spatial scan statistic and identify large and dominant clusters. However, smaller clusters located in areas with a relatively low density of cases may be missed. Given that this study focused on a highly urbanized and populated area, future research should assess the influence of spatial aggregation error on spatial scan analysis in suburban and rural regions

    Event detection in location-based social networks

    Get PDF
    With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft

    Merging high-resolution satellite-based precipitation fields and point-scale rain gauge measurements-A case study in Chile

    Get PDF
    With high spatial-temporal resolution, Satellite-based Precipitation Estimates (SPE) are becoming valuable alternative rainfall data for hydrologic and climatic studies but are subject to considerable uncertainty. Effective merging of SPE and ground-based gauge measurements may help to improve precipitation estimation in both better resolution and accuracy. In this study, a framework for merging satellite and gauge precipitation data is developed based on three steps, including SPE bias adjustment, gauge observation gridding, and data merging, with the objective to produce high-quality precipitation estimates. An inverse-root-mean-square-error weighting approach is proposed to combine the satellite and gauge estimates that are in advance adjusted and gridded, respectively. The model is applied and tested with the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System (PERSIANN-CCS) estimates (daily, 0.04° × 0.04°) over Chile, for the 6 year period of 2009-2014. Daily observations from about 90% of collected gauges over the study area are used for model calibration; the rest of the gauged data are regarded as ground “truth” for validation. Evaluation results indicate high effectiveness of the model in producing high-resolution-precision precipitation data. Compared to reference data, the merged data (daily) show correlation coefficients, probabilities of detection, root-mean-square errors, and absolute mean biases that were consistently improved from the original PERSIANN-CCS estimates. The cross-validation evidences that the framework is effective in providing high-quality estimates even over nongauged satellite pixels. The same method can be applied globally and is expected to produce precipitation products in near real time by integrating gauge observations with satellite estimates
    corecore