53 research outputs found

    Statistics for Spatially Stratified Heterogeneous Data

    Full text link
    Spatial statistics is dominated by spatial autocorrelation (SAC) based Kriging and BHM, and spatial local heterogeneity based hotspots and geographical regression methods, appraised as the first and second laws of Geography (Tobler 1970; Goodchild 2004), respectively. Spatial stratified heterogeneity (SSH), the phenomena of a partition that within strata is more similar than between strata, examples are climate zones and landuse classes and remote sensing classification, is prevalent in geography and understood since ancient Greek, is surprisingly neglected in Spatial Statistics, probably due to the existence of hundreds of classification algorithms. In this article, we go beyond the classifications and disclose that SSH is the sources of sample bias, statistic bias, modelling confounding and misleading CI, and recommend robust solutions to overcome the negativity. In the meantime, we elaborate four benefits from SSH: creating identical PDF or equivalent to random sampling in stratum; the spatial pattern in strata, the borders between strata as a specific information for nonlinear causation; and general interaction by overlaying two spatial patterns. We developed the equation of SSH and discuss its context. The comprehensive investigation formulates the statistics for SSH, presenting a new principle and toolbox in spatial statistics

    Estimation of Areal Mean Rainfall in Remote Areas Using B-SHADE Model

    Get PDF
    This study presented a method to estimate areal mean rainfall (AMR) using a Biased Sentinel Hospital Based Area Disease Estimation (B-SHADE) model, together with biased rain gauge observations and Tropical Rainfall Measuring Mission (TRMM) data, for remote areas with a sparse and uneven distribution of rain gauges. Based on the B-SHADE model, the best linear unbiased estimation of AMR could be obtained. A case study was conducted for the Three-River Headwaters region in the Tibetan Plateau of China, and its performance was compared with traditional methods. The results indicated that B-SHADE obtained the least estimation biases, with a mean error and root mean square error of −0.63 and 3.48 mm, respectively. For the traditional methods including arithmetic average, Thiessen polygon, and ordinary kriging, the mean errors were 7.11, −1.43, and 2.89 mm, which were up to 1027.1%, 127.0%, and 358.3%, respectively, greater than for the B-SHADE model. The root mean square errors were 10.31, 4.02, and 6.27 mm, which were up to 196.1%, 15.5%, and 80.0%, respectively, higher than for the B-SHADE model. The proposed technique can be used to extend the AMR record to the presatellite observation period, when only the gauge data are available

    Estimation of Areal Mean Rainfall in Remote Areas Using B-SHADE Model

    Get PDF

    Refining Time-Activity Classification of Human Subjects Using the Global Positioning System

    Get PDF
    BACKGROUND:Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. METHODS:Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. RESULTS:Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. CONCLUSIONS:The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations

    Statistical modeling of spatially stratified heterogeneous data

    Get PDF
    Spatial statistics is an important methodology for geospatial data analysis. It has evolved to handle spatially autocorrelated data and spatially (locally) heterogeneous data, which aim to capture the first and second laws of geography, respectively. Examples of spatially stratified heterogeneity (SSH) include climatic zones and land-use types. Methods for such data are relatively underdeveloped compared to the first two properties. The presence of SSH is evidence that nature is lawful and structured rather than purely random. This induces another “layer” of causality underlying variations observed in geographical data. In this article, we go beyond traditional cluster-based approaches and propose a unified approach for SSH in which we provide an equation for SSH, display how SSH is a source of bias in spatial sampling and confounding in spatial modeling, detect nonlinear stochastic causality inherited in SSH distribution, quantify general interaction identified by overlaying two SSH distributions, perform spatial prediction based on SSH, develop a new measure for spatial goodness of fit, and enhance global modeling by integrating them with an SSH q statistic. The research advances statistical theory and methods for dealing with SSH data, thereby offering a new toolbox for spatial data analysis

    Determinants of the Incidence of Hand, Foot and Mouth Disease in China Using Geographically Weighted Regression Models

    Get PDF
    Child population density and climate factors are potential determinants of the HFMD incidence in most areas in China. The strength and direction of association between these factors and the incidence of HFDM is spatially heterogeneous at the local geographic level, and child population density has a greater influence on the incidence of HFMD than the climate factors

    Comparison of birth certificates and hospital-based birth data on pregnancy complications in Los Angeles and Orange County, California

    Full text link
    BACKGROUND: The incidence of both gestational diabetes mellitus and preeclampsia is on the rise; however, these pregnancy complications may not be systematically reported. This study aimed to examine differences in reporting of preeclampsia and gestational diabetes between hospital records and birth certificate data, and to determine if such differences vary by maternal socioeconomic status indicators. METHODS: We obtained over 70,000 birth records from 2001 to 2006 from the perinatal research database of the Memorial Care system, a network of four hospitals in Los Angeles and Orange Counties, California. Memorial birth records were matched to corresponding state birth certificate records and analyzed to determine differential rates of reporting of preeclampsia and diabetes. Additionally, the influence of maternal socioeconomic factors on the reported incidence of such adverse pregnancy outcomes was analyzed. Socioeconomic factors of interest included maternal education levels, race, and type of health insurance (private or public). RESULTS: It was found that the birth certificate data significantly underreported the incidence of both preeclampsia (1.38 % vs. 3.13 %) and diabetes (1.97 % vs. 5.56 %) when compared to Memorial data. For both outcomes of interest, the degree of underreporting was significantly higher among women with lower education levels, among Hispanic women compared to Non-Hispanic White women, and among women with public health insurance. CONCLUSION: The Memorial Care database is a more reliable source of information than birth certificate data for analyzing the incidence of preeclampsia and diabetes among women in Los Angeles and Orange Counties, especially for subpopulations of lower socioeconomic status

    Scaling Flux Tower Observations of Sensible Heat Flux Using Weighted Area-to-Area Regression Kriging

    No full text
    Sensible heat flux (H) plays an important role in characterizations of land surface water and heat balance. There are various types of H measurement methods that depend on observation scale, from local-area-scale eddy covariance (EC) to regional-scale large aperture scintillometer (LAS) and remote sensing (RS) products. However, methods of converting one H scale to another to validate RS products are still open for question. A previous area-to-area regression kriging-based scaling method performed well in converting EC-scale H to LAS-scale H. However, the method does not consider the path-weighting function in the EC- to LAS-scale kriging with the regression residue, which inevitably brought about a bias estimation. In this study, a weighted area-to-area regression kriging (WATA RK) model is proposed to convert EC-scale H to LAS-scale H. It involves path-weighting functions of EC and LAS source areas in both regression and area kriging stages. Results show that WATA RK outperforms traditional methods in most cases, improving estimation accuracy. The method is considered to provide an efficient validation of RS H flux products

    Estimation of daily PM<inf>2.5</inf>concentration and its relationship with meteorological conditions in Beijing

    No full text
    © 2016 When investigating the impact of air pollution on health, particulate matter less than 2.5 μm in aerodynamic diameter (PM 2.5 ) is considered more harmful than particulates of other sizes. Therefore, studies of PM 2.5 have attracted more attention. Beijing, the capital of China, is notorious for its serious air pollution problem, an issue which has been of great concern to the residents, government, and related institutes for decades. However, in China, significantly less time has been devoted to observing PM 2.5 than for PM 10 . Especially before 2013, the density of the PM 2.5 ground observation network was relatively low, and the distribution of observation stations was uneven. One solution is to estimate PM 2.5 concentrations from the existing data on PM 10 . In the present study, by analyzing the relationship between the concentrations of PM 2.5 and PM 10 , and the meteorological conditions for each season in Beijing from 2008 to 2014, a U-shaped relationship was found between the daily maximum wind speed and the daily PM concentration, including both PM 2.5 and PM 10 . That is, the relationship between wind speed and PM concentration is not a simple positive or negative correlation in these wind directions; their relationship has a complex effect, with higher PM at low and high wind than for moderate winds. Additionally, in contrast to previous studies, we found that the PM 2.5 /PM 10 ratio is proportional to the mean relative humidity (MRH). According to this relationship, for each season we established a multiple nonlinear regression (MNLR) model to estimate the PM 2.5 concentrations of the missing periods.Link_to_subscribed_fulltex
    corecore