3,871 research outputs found
Recommended from our members
Evaluating the utility of multispectral information in delineating the areal extent of precipitation
Data from geosynchronous Earth-orbiting (GEO) satellites equipped with visible (VIS) and infrared (IR) scanners are commonly used in rain retrieval algorithms. These algorithms benefit from the high spatial and temporal resolution of GEO observations, either in stand-alone mode or in combination with higher-quality but less frequent microwave observations from low Earth-orbiting (LEO) satellites. In this paper, a neural network-based framework is presented to evaluate the utility of multispectral information in improving rain/no-rain (R/NR) detection. The algorithm uses the powerful classification features of the self-organizing feature map (SOFM), along with probability matching techniques to map single- or multispectral input space into R/NR maps. The framework was tested and validated using the 31 possible combinations of the five Geostationary Operational Environmental Satellite 12 (GOES-12) channels. An algorithm training and validation study was conducted over the conterminous United States during June-August 2006. The results indicate that during daytime, the visible channel (0.65 μm) can yield significant improvements in R/NR detection capabilities, especially when combined with any of the other four GOES-12 channels. Similarly, for nighttime detection the combination of two IR channels - particularly channels 3 (6.5 μm) and 4 (10.7 μm)-resulted in significant performance gain over any single IR channel. In both cases, however, using more than two channels resulted only in marginal improvements over two-channel combinations. Detailed examination of event-based images indicate that the proposed algorithm is capable of extracting information useful to screen no-rain pixels associated with cold, thin clouds and identifying rain areas under warm but rainy clouds. Both cases have been problematic areas for IR-only algorithms. © 2009 American Meteorological Society
Deriving Supply-side Variables to Extend Geodemographic Classification
The traditional proprietary geodemographic information systems that are on the market today use well-established methodologies. Demographic indicators are selected as a proxy for affluence and are then often linked to customer databases to derive a measure of the level of consumption expected from the different area typologies. However, these systems ignore fundamental relationships in the retail market by focusing upon demand characteristics in a ‘vacuum’ and ignore the supply side and consumer-supplier interaction.
This paper argues that there may be considerable advantages to including supply-side indicators within geodemographic systems. Whilst the term ‘supply’ in this context might imply the number of consumer services already in an area, equally important for understanding demand are variables such as the supply of jobs and houses. We suggest that profiling an area in terms of its labour market characteristics gives a better insight into the income chain while the supply of houses could be argued to be a crucial factor in household formation that in turn will impact upon demographic structure. Using the regional example of Yorkshire and Humberside in northern England, we indicate how a suite of supply-side variables relating to the labour market can be assembled and used alongside a suite of demand variables to generate a new area classification. Spatial interaction models are calibrated to derive some of the variables that take into account zonal self-containment and catchment size
On the spatio-temporal analysis of hydrological droughts from global hydrological models
The recent concerns for world-wide extreme events related to climate change have motivated the development of large scale models that simulate the global water cycle. In this context, analysis of hydrological extremes is important and requires the adaptation of identification methods used for river basin models. This paper presents two methodologies that extend the tools to analyze spatio-temporal drought development and characteristics using large scale gridded time series of hydrometeorological data. The methodologies are classified as non-contiguous and contiguous drought area analyses (i.e. NCDA and CDA). The NCDA presents time series of percentages of areas in drought at the global scale and for pre-defined regions of known hydroclimatology. The CDA is introduced as a complementary method that generates information on the spatial coherence of drought events at the global scale. Spatial drought events are found through CDA by clustering patterns (contiguous areas). In this study the global hydrological model WaterGAP was used to illustrate the methodology development. Global gridded time series of subsurface runoff (resolution 0.5°) simulated with the WaterGAP model from land points were used. The NCDA and CDA were developed to identify drought events in runoff. The percentages of area in drought calculated with both methods show complementary information on the spatial and temporal events for the last decades of the 20th century. The NCDA provides relevant information on the average number of droughts, duration and severity (deficit volume) for pre-defined regions (globe, 2 selected hydroclimatic regions). Additionally, the CDA provides information on the number of spatially linked areas in drought, maximum spatial event and their geographic location on the globe. Some results capture the overall spatio-temporal drought extremes over the last decades of the 20th century. Events like the El Niño Southern Oscillation (ENSO) in South America and the pan-European drought in 1976 appeared clearly in both analyses. The methodologies introduced provide an important basis for the global characterization of droughts, model inter-comparison of drought identified from global hydrological models and spatial event analyse
Socio-spatial inequalities in late-stage cancer diagnosis in Illinois: spatiotemporal trends and methodological challenges
This dissertation examines the effects of social and spatial inequalities on late-stage diagnosis of colorectal and breast cancer, and it addresses several methodological challenges surrounding the use of ZIP codes as a study unit in analyzing late-stage cancer at diagnosis. Given that my dissertation follows the ???three-paper??? format, the abstract section is divided into three parts to describe each paper respectively.
The first paper entitled ???Spatial Distribution of Late-Stage Colorectal Cancer in Illinois from 1988 to 2002: Associations with Social-Spatial Covariates???, examines spatial patterns of late-stage colorectal cancer diagnosis over time in Illinois during a period of increasing screening, and it analyzes the varying associations between social, demographic and spatial risk factors and late-stage colorectal cancer diagnosis within the same period. The Bernoulli-based spatial scan statistic was used to detect clusters of late-to-early stage cancer ratios at the ZIP code level in Illinois during two periods: 1988 to 1992, and 1998 to 2002. Then the whole state was divided into three study region: Chicago city, Chicago suburbs, and other areas. For each region in each time period, hierarchical logistic regression models were estimated to assess the associations between demographic, social and spatial factors and late-stage colorectal cancer risk. ZIP code level risk factors include three indicators of socio-economic status and the shortest travel time to the nearest colonoscopy facility and individual-level factors including age, race, and gender. The socio-economic indicators were created using factor analysis.
The results show some changes over time in the spatial distribution of late-stage colorectal cancer and the impacts of risk factors at the ZIP code and individual levels. Specifically, results of the Bernoulli-based spatial scan statistic find statistically significant clusters of late-stage colorectal cancer in the Chicago metropolitan area and rural region in southern Illinois in the period of 1988 to 1992. In the later time period, the cluster outcomes were no longer statistically significant. The change indicates that late-stage risk of colorectal cancer has become more evenly distributed in Illinois over time. In terms of the hierarchical logistic regression results, both individual-level demographic factors and zip-code level covariates present variously important impacts on the risk of the late-stage colorectal cancer diagnosis in different study regions in the two time periods. The risk of late-stage diagnosis is higher among younger colorectal cancer patients. Gender has contradictory impacts on risk in Chicago city and its suburb. The shortest travel time to the nearest cancer screening providers is positively associated with late-stage diagnosis risk outside the Chicago region, suggesting that spatial access to screening services may be an important barrier to early detection in rural areas of the state. One socio-economic status indicator, Minority Disparities, demonstrated a significantly positive relationship with late-stage diagnosis risk outside the Chicago region. Similar to the effects of gender, Factor 3 (Cultural-Language Barriers) also had contradictory effects in Chicago city and suburbs. Overall, the results showed no clear trends over time in the effects of various factors on late-stage risk, and few strong and statistically significant results. The inconsistent findings suggest the need for more detailed and localized information.
The second paper is titled ???Analyzing Spatial Aggregation Error in Statistical Models of Late-Stage Cancer Risk: A Monte Carlo Simulation Approach???. This paper examines the effect of spatial aggregation error on statistical estimates of the association between spatial access to health care and late-stage cancer. Monte Carlo simulation was used to disaggregate breast cancer cases for two Illinois counties from ZIP codes to census blocks in proportion to the age-race composition of the block population. After the disaggregation, a hierarchical logistic model was estimated examining the relationship between late-stage breast cancer and risk factors including travel distance to mammography, at both the ZIP code and census block levels. Model coefficients were compared between the two levels to assess the impact of spatial aggregation error.
Spatial aggregation error is found to influence the coefficients of regression-type models at the ZIP code level, and this impact is highly dependent on the study area. In one study area (Kane County), block-level coefficients were very similar to those estimated on the basis of ZIP code data; whereas in the other study area (Peoria County), the two sets of coefficients differed substantially raising the possibility of drawing inaccurate inferences about the association between distance to mammography and late-stage cancer risk. The paper reveals that spatial aggregation error can significantly affect the coefficient values in statistical models of the association between cancer outcomes and spatial and non-spatial variables and thus affect inferences drawn from these models. Relying on data at the ZIP code level may lead to inaccurate findings on health risk factors, and the effects are likely to vary from one study area to another.
The third paper, titled ???The Impact of Spatial Aggregation Error on Spatial Scan Analysis: A Case Study of Colorectal Cancer,??? aims to examine the effect of spatial aggregation error on results of the spatial scan statistic by geographically and statistically comparing results at the ZIP code level and three reference (census tract, census block group and census block) levels. Data on colorectal cancer cases in Cook County, IL for a 5-year interval (1998-2002) were used. The Monte Carlo simulation approach from the second paper was applied to disaggregate the cancer data from the ZIP code level to each reference level. The Bernoulli-based spatial scan statistic was implemented in SaTScan to detect primary clusters based on cancer data at the four levels. An interactive procedure involving SAS and Java programming, was designed to automatically run SaTScan hundreds of times. Characteristics of clusters at each reference level were compared to those of the ZIP code level cluster to observe differences related to spatial aggregation.
The comparison reveals that the ZIP code level spatial scan statistic can generate reliable clusters at the global level in areas with a large number of cases. Nonetheless, the ZIP code analysis sometimes fails to detect clusters in areas with a lower density of cases. Spatial aggregation error is minimized in areas with sizeable numbers of cases. In the absence of cancer data at a lower level, the ZIP code level data can be used effectively to implement the spatial scan statistic and identify large and dominant clusters. However, smaller clusters located in areas with a relatively low density of cases may be missed. Given that this study focused on a highly urbanized and populated area, future research should assess the influence of spatial aggregation error on spatial scan analysis in suburban and rural regions
Recommended from our members
PERSIANN-MSA: A precipitation estimation method from satellite-based multispectral analysis
Visible and infrared data obtained from instruments onboard geostationary satellites have been extensively used for monitoring clouds and their evolution. The Advanced Baseline Imager (ABI) that will be launched onboard the Geostationary Operational Environmental Satellite-R (GOES-R) series in the near future will offer a larger range of spectral bands; hence, it will provide observations of cloud and rain systems at even finer spatial, temporal, and spectral resolutions than are possible with the current GOES. In this paper, a new method called Precipitation Estimation from Remotely Sensed information using Artificial Neural Networks-Multispectral Analysis (PERSIANN-MSA) is proposed to evaluate the effect of using multispectral imagery on precipitation estimation. The proposed approach uses a self-organizing feature map (SOFM) to classify multidimensional input information, extracted from each grid box and corresponding textural features of multispectral bands. In addition, principal component analysis (PCA) is used to reduce the dimensionality to a few independent input features while preserving most of the variations of all input information. The above method is applied to estimate rainfall using multiple channels of the Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard the Meteosat Second Generation (MSG) satellite. In comparison to the use of a single thermal infrared channel, the analysis shows that using multispectral data has the potential to improve rain detection and estimation skills with an average of more than 50% gain in equitable threat score for rain/no-rain detection, and more than 20% gain in correlation coefficient associated with rain-rate estimation. © 2009 American Meteorological Society
Event detection in location-based social networks
With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft
Recommended from our members
Daytime precipitation estimation using bispectral cloud classification system
Two previously developed Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) algorithms that incorporate cloud classification system (PERSIANN-CCS) and multispectral analysis (PERSIANN-MSA) are integrated and employed to analyze the role of cloud albedo from Geostationary Operational Environmental Satellite-12 (GOES-12) visible (0.65 μm) channel in supplementing infrared (10.7 mm) data. The integrated technique derives finescale (0.04° × 0.04° latitudelongitude every 30 min) rain rate for each grid box through four major steps: 1) segmenting clouds into a number of cloud patches using infrared or albedo images; 2) classification of cloud patches into a number of cloud types using radiative, geometrical, and textural features for each individual cloud patch; 3) classification of each cloud type into a number of subclasses and assigning rain rates to each subclass using a multidimensional histogram matching method; and 4) associating satellite gridbox information to the appropriate corresponding cloud type and subclass to estimate rain rate in grid scale. The technique was applied over a study region that includes the U.S. landmass east of 115°W. One reference infrared-only and three different bis-pectral (visible and infrared) rain estimation scenarios were compared to investigate the technique's ability to address two major drawbacks of infrared-only methods: 1) underestimating warm rainfall and 2) the inability to screen out no-rain thin cirrus clouds. Radar estimates were used to evaluate the scenarios at a range of temporal (3 and 6 hourly) and spatial (0.04°, 0.08°, 0.12°, and 0.24° latitude-longitude) scales. Overall, the results using daytime data during June-August 2006 indicate that significant gain over infrared-only technique is obtained once albedo is used for cloud segmentation followed by bispectral cloud classification and rainfall estimation. At 3-h, 0.04° resolution, the observed improvement using bispectral information was about 66% for equitable threat score and 26% for the correlation coefficient. At coarser 0.24° resolution, the gains were 34% and 32% for the two performance measures, respectively. © 2010 American Meteorological Society
Merging high-resolution satellite-based precipitation fields and point-scale rain gauge measurements-A case study in Chile
With high spatial-temporal resolution, Satellite-based Precipitation Estimates (SPE) are becoming valuable alternative rainfall data for hydrologic and climatic studies but are subject to considerable uncertainty. Effective merging of SPE and ground-based gauge measurements may help to improve precipitation estimation in both better resolution and accuracy. In this study, a framework for merging satellite and gauge precipitation data is developed based on three steps, including SPE bias adjustment, gauge observation gridding, and data merging, with the objective to produce high-quality precipitation estimates. An inverse-root-mean-square-error weighting approach is proposed to combine the satellite and gauge estimates that are in advance adjusted and gridded, respectively. The model is applied and tested with the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System (PERSIANN-CCS) estimates (daily, 0.04° × 0.04°) over Chile, for the 6 year period of 2009-2014. Daily observations from about 90% of collected gauges over the study area are used for model calibration; the rest of the gauged data are regarded as ground “truth” for validation. Evaluation results indicate high effectiveness of the model in producing high-resolution-precision precipitation data. Compared to reference data, the merged data (daily) show correlation coefficients, probabilities of detection, root-mean-square errors, and absolute mean biases that were consistently improved from the original PERSIANN-CCS estimates. The cross-validation evidences that the framework is effective in providing high-quality estimates even over nongauged satellite pixels. The same method can be applied globally and is expected to produce precipitation products in near real time by integrating gauge observations with satellite estimates
- …