133 research outputs found

    Geostatistical analysis of disease data: estimation of cancer mortality risk from empirical frequencies using Poisson kriging

    Get PDF
    BACKGROUND: Cancer mortality maps are used by public health officials to identify areas of excess and to guide surveillance and control activities. Quality of decision-making thus relies on an accurate quantification of risks from observed rates which can be very unreliable when computed from sparsely populated geographical units or recorded for minority populations. This paper presents a geostatistical methodology that accounts for spatially varying population sizes and spatial patterns in the processing of cancer mortality data. Simulation studies are conducted to compare the performances of Poisson kriging to a few simple smoothers (i.e. population-weighted estimators and empirical Bayes smoothers) under different scenarios for the disease frequency, the population size, and the spatial pattern of risk. A public-domain executable with example datasets is provided. RESULTS: The analysis of age-adjusted mortality rates for breast and cervix cancers illustrated some key features of commonly used smoothing techniques. Because of the small weight assigned to the rate observed over the entity being smoothed (kernel weight), the population-weighted average leads to risk maps that show little variability. Other techniques assign larger and similar kernel weights but they use a different piece of auxiliary information in the prediction: global or local means for global or local empirical Bayes smoothers, and spatial combination of surrounding rates for the geostatistical estimator. Simulation studies indicated that Poisson kriging outperforms other approaches for most scenarios, with a clear benefit when the risk values are spatially correlated. Global empirical Bayes smoothers provide more accurate predictions under the least frequent scenario of spatially random risk. CONCLUSION: The approach presented in this paper enables researchers to incorporate the pattern of spatial dependence of mortality rates into the mapping of risk values and the quantification of the associated uncertainty, while being easier to implement than a full Bayesian model. The availability of a public-domain executable makes the geostatistical analysis of health data, and its comparison to traditional smoothers, more accessible to common users. In future papers this methodology will be generalized to the simulation of the spatial distribution of risk values and the propagation of the uncertainty attached to predicted risks in local cluster analysis

    Geostatistical analysis of disease data: accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using area-to-point Poisson kriging

    Get PDF
    BACKGROUND: Geostatistical techniques that account for spatially varying population sizes and spatial patterns in the filtering of choropleth maps of cancer mortality were recently developed. Their implementation was facilitated by the initial assumption that all geographical units are the same size and shape, which allowed the use of geographic centroids in semivariogram estimation and kriging. Another implicit assumption was that the population at risk is uniformly distributed within each unit. This paper presents a generalization of Poisson kriging whereby the size and shape of administrative units, as well as the population density, is incorporated into the filtering of noisy mortality rates and the creation of isopleth risk maps. An innovative procedure to infer the point-support semivariogram of the risk from aggregated rates (i.e. areal data) is also proposed. RESULTS: The novel methodology is applied to age-adjusted lung and cervix cancer mortality rates recorded for white females in two contrasted county geographies: 1) state of Indiana that consists of 92 counties of fairly similar size and shape, and 2) four states in the Western US (Arizona, California, Nevada and Utah) forming a set of 118 counties that are vastly different geographical units. Area-to-point (ATP) Poisson kriging produces risk surfaces that are less smooth than the maps created by a naïve point kriging of empirical Bayesian smoothed rates. The coherence constraint of ATP kriging also ensures that the population-weighted average of risk estimates within each geographical unit equals the areal data for this unit. Simulation studies showed that the new approach yields more accurate predictions and confidence intervals than point kriging of areal data where all counties are simply collapsed into their respective polygon centroids. Its benefit over point kriging increases as the county geography becomes more heterogeneous. CONCLUSION: A major limitation of choropleth maps is the common biased visual perception that larger rural and sparsely populated areas are of greater importance. The approach presented in this paper allows the continuous mapping of mortality risk, while accounting locally for population density and areal data through the coherence constraint. This form of Poisson kriging will facilitate the analysis of relationships between health data and putative covariates that are typically measured over different spatial supports

    Geostatistical analysis of disease data: visualization and propagation of spatial uncertainty in cancer mortality risk using Poisson kriging and p-field simulation

    Get PDF
    BACKGROUND: Smoothing methods have been developed to improve the reliability of risk cancer estimates from sparsely populated geographical entities. Filtering local details of the spatial variation of the risk leads however to the detection of larger clusters of low or high cancer risk while most spatial outliers are filtered out. Static maps of risk estimates and the associated prediction variance also fail to depict the uncertainty attached to the spatial distribution of risk values and does not allow its propagation through local cluster analysis. This paper presents a geostatistical methodology to generate multiple realizations of the spatial distribution of risk values. These maps are then fed into spatial operators, such as in local cluster analysis, allowing one to assess how risk spatial uncertainty translates into uncertainty about the location of spatial clusters and outliers. This novel approach is applied to age-adjusted breast and pancreatic cancer mortality rates recorded for white females in 295 US counties of the Northeast (1970–1994). A public-domain executable with example datasets is provided. RESULTS: Geostatistical simulation generates risk maps that are more variable than the smooth risk map estimated by Poisson kriging and reproduce better the spatial pattern captured by the risk semivariogram model. Local cluster analysis of the set of simulated risk maps leads to a clear visualization of the lower reliability of the classification obtained for pancreatic cancer versus breast cancer: only a few counties in the large cluster of low risk detected in West Virginia and Southern Pennsylvania are significant over 90% of all simulations. On the other hand, the cluster of high breast cancer mortality in Niagara county, detected after application of Poisson kriging, appears on 60% of simulated risk maps. Sensitivity analysis shows that 500 realizations are needed to achieve a stable classification for pancreatic cancer, while convergence is reached for less than 300 realizations for breast cancer. CONCLUSION: The approach presented in this paper enables researchers to generate a set of simulated risk maps that are more realistic than a single map of smoothed mortality rates and allow the propagation of cancer risk uncertainty through local cluster analysis. Coupled with visualization and querying capabilities of geographical information systems, animated display of realizations can highlight areas that depart consistently from the general behavior observed across the region, guiding further investigation and control activities

    Geographical, temporal and racial disparities in late-stage prostate cancer incidence across Florida: A multiscale joinpoint regression analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although prostate cancer-related incidence and mortality have declined recently, striking racial/ethnic differences persist in the United States. Visualizing and modelling temporal trends of prostate cancer late-stage incidence, and how they vary according to geographic locations and race, should help explaining such disparities. Joinpoint regression is increasingly used to identify the timing and extent of changes in time series of health outcomes. Yet, most analyses of temporal trends are aspatial and conducted at the national level or for a single cancer registry.</p> <p>Methods</p> <p>Time series (1981-2007) of annual proportions of prostate cancer late-stage cases were analyzed for non-Hispanic Whites and non-Hispanic Blacks in each county of Florida. Noise in the data was first filtered by binomial kriging and results were modelled using joinpoint regression. A similar analysis was also conducted at the state level and for groups of metropolitan and non-metropolitan counties. Significant racial differences were detected using tests of parallelism and coincidence of time trends. A new disparity statistic was introduced to measure spatial and temporal changes in the frequency of racial disparities.</p> <p>Results</p> <p>State-level percentage of late-stage diagnosis decreased 50% since 1981; a decline that accelerated in the 90's when Prostate Specific Antigen (PSA) screening was introduced. Analysis at the metropolitan and non-metropolitan levels revealed that the frequency of late-stage diagnosis increased recently in urban areas, and this trend was significant for white males. The annual rate of decrease in late-stage diagnosis and the onset years for significant declines varied greatly among counties and racial groups. Most counties with non-significant average annual percent change (AAPC) were located in the Florida Panhandle for white males, whereas they clustered in South-eastern Florida for black males. The new disparity statistic indicated that the spatial extent of racial disparities reached a peak in 1990 because of an early decline in frequency of late-stage diagnosis observed for black males.</p> <p>Conclusions</p> <p>Analyzing temporal trends in cancer incidence and mortality rates outside a spatial framework is unsatisfactory, since it leads one to overlook significant geographical variation which can potentially generate new insights about the impact of various interventions. Differences observed among nested geographies in Florida show how the modifiable areal unit problem (MAUP) also impacts the analysis of temporal changes.</p

    How does Poisson kriging compare to the popular BYM model for mapping disease risks?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Geostatistical techniques are now available to account for spatially varying population sizes and spatial patterns in the mapping of disease rates. At first glance, Poisson kriging represents an attractive alternative to increasingly popular Bayesian spatial models in that: 1) it is easier to implement and less CPU intensive, and 2) it accounts for the size and shape of geographical units, avoiding the limitations of conditional auto-regressive (CAR) models commonly used in Bayesian algorithms while allowing for the creation of isopleth risk maps. Both approaches, however, have never been compared in simulation studies, and there is a need to better understand their merits in terms of accuracy and precision of disease risk estimates.</p> <p>Results</p> <p>Besag, York and Mollie's (BYM) model and Poisson kriging (point and area-to-area implementations) were applied to age-adjusted lung and cervix cancer mortality rates recorded for white females in two contrasted county geographies: 1) state of Indiana that consists of 92 counties of fairly similar size and shape, and 2) four states in the Western US (Arizona, California, Nevada and Utah) forming a set of 118 counties that are vastly different geographical units. The spatial support (i.e. point versus area) has a much smaller impact on the results than the statistical methodology (i.e. geostatistical versus Bayesian models). Differences between methods are particularly pronounced in the Western US dataset: BYM model yields smoother risk surface and prediction variance that changes mainly as a function of the predicted risk, while the Poisson kriging variance increases in large sparsely populated counties. Simulation studies showed that the geostatistical approach yields smaller prediction errors, more precise and accurate probability intervals, and allows a better discrimination between counties with high and low mortality risks. The benefit of area-to-area Poisson kriging increases as the county geography becomes more heterogeneous and when data beyond the adjacent counties are used in the estimation. The trade-off cost for the easier implementation of point Poisson kriging is slightly larger kriging variances, which reduces the precision of the model of uncertainty.</p> <p>Conclusion</p> <p>Bayesian spatial models are increasingly used by public health officials to map mortality risk from observed rates, a preliminary step towards the identification of areas of excess. More attention should however be paid to the spatial and distributional assumptions underlying the popular BYM model. Poisson kriging offers more flexibility in modeling the spatial structure of the risk and generates less smoothing, reducing the likelihood of missing areas of high risk.</p

    Accounting for Estimation Optimality Criteria in Simulated Annealing

    Full text link
    This paper presents both estimation and simulation as optimization problems that differ in the optimization criteria, minimization of a local expected loss for estimation and reproduction of global statistics (semivariogram, histogram) for simulation. An intermediate approach is proposed whereby an initial random image is gradually modified using simulated annealing so as to better match both local and global constraints. The relative weights of the different constraints in the objective function allow the user to strike a balance between smoothness of the estimated map and reproduction of spatial variability by simulated maps. The procedure is illustrated using a synthetic dataset. The proposed approach is shown to enhance the influence of observations on neighboring simulated values, hence the final realizations appear to be “better conditioned” to the sample information. It also produces maps that are more accurate (smaller prediction error) than stochastic simulation ignoring local constraints, but not as accurate as E-type estimation. Flow simulation results show that accounting for local constraints yields, on average, smaller errors in production forecast than a smooth estimated map or a simulated map that reproduces only the histogram and semivariogram. The approach thus reduces the risk associated with the use of a single realization for forecasting and planning.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43199/1/11004_2004_Article_412233.pd

    Impact of the simulation algorithm, magnitude of ergodic fluctuations and number of realizations on the spaces of uncertainty of flow properties

    Full text link
     Geostatistical simulation algorithms are routinely used to generate conditional realizations of the spatial distribution of petrophysical properties, which are then fed into complex transfer functions, e.g. a flow simulator, to yield a distribution of responses, such as the time to recover a given proportion of the oil. This latter distribution, often referred to as the space of uncertainty, cannot be defined analytically because of the complexity (non-linearity) of transfer functions, but it can be characterized algorithmically through the generation of many realizations. This paper compares the space of uncertainty generated by four of the most commonly used algorithms: sequential Gaussian simulation, sequential indicator simulation, p -field simulation and simulated annealing. Conditional to 80 sample permeability values randomly drawn from an exhaustive 40×40 image, 100 realizations of the spatial distribution of permeability values are generated using each algorithm and fed into a pressure solver and a flow simulator. Principal component analysis is used to display the sets of realizations into the joint space of uncertainty of the response variables (effective permeability, times to reach 5% and 95% water cuts and to recover 10% and 50% of the oil). The attenuation of ergodic fluctuations through a rank-preserving transform of permeability values reduces substantially the extent of the space of uncertainty for sequential indicator simulation and p -field simulation, while improving the prediction of the response variable by the mean of the output distribution. Differences between simulation algorithms are the most pronounced for long-term responses (95% water cut and 50% oil recovery), with sequential Gaussian simulation yielding the most accurate prediction. In this example, utilizing more than 20 realizations generally increases only slightly the size of the space of uncertainty.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42311/1/477-13-3-161_90130161.pd

    Geostatistical incorporation of spatial coordinates into supervised classification of hyperspectral data

    Full text link
     This paper presents a methodology to incorporate both hyperspectral properties and spatial coordinates of pixels in maximum likelihood classification. Indicator kriging of ground data is used to estimate, for each pixel, the prior probabilities of occurrence of classes which are then combined with spectral-based probabilities within a Bayesian framework. In the case study (mapping of in-stream habitats), accounting for spatial coordinates increases the overall producer's accuracy from 85.8% to 93.8%, while the Kappa statistic rises from 0.74 to 0.88. Best results are obtained using only indicator kriging-based probabilities, with a stunning overall accuracy of 97.2%. Significant improvements are observed for environmentally important units, such as pools (Kappa: 0.17 to 0.74) and eddy drop zones (Kappa: 0.65 to 0.87). The lack of benefit of using hyperspectral information in the present study can be explained by the dense network of ground observations and the high spatial continuity of field classification which might be spurious.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42347/1/10109-4-1-99_20040099.pd

    Ordinary Cokriging Revisited

    Full text link
    This paper sets up the relations between simple cokriging and ordinary cokriging with one or several unbiasedness constraints. Differences between cokriging variants are related to differences between models adopted for the means of primary and secondary variables. Because it is not necessary for the secondary data weights to sum to zero, ordinary cokriging with a single unbiasedness constraint gives a larger weight to the secondary information while reducing the occurrence of negative weights. Also the weights provided by such cokriging systems written in terms of covariances or correlograms are not related linearly, hence the estimates are different. The prediction performances of cokriging estimators are assessed using an environmental dataset that includes concentrations of five heavy metals at 359 locations. Analysis of reestimation scores at 100 test locations shows that kriging and cokriging perform equally when the primary and secondary variables are sampled at the same locations. When the secondary information is available at the estimated location, one gains little by retaining other distant secondary data in the estimation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43198/1/11004_2004_Article_412218.pd
    corecore