10,988 research outputs found

    The importance of scale in spatially varying coefficient modeling

    Get PDF
    While spatially varying coefficient (SVC) models have attracted considerable attention in applied science, they have been criticized as being unstable. The objective of this study is to show that capturing the "spatial scale" of each data relationship is crucially important to make SVC modeling more stable, and in doing so, adds flexibility. Here, the analytical properties of six SVC models are summarized in terms of their characterization of scale. Models are examined through a series of Monte Carlo simulation experiments to assess the extent to which spatial scale influences model stability and the accuracy of their SVC estimates. The following models are studied: (i) geographically weighted regression (GWR) with a fixed distance or (ii) an adaptive distance bandwidth (GWRa), (iii) flexible bandwidth GWR (FB-GWR) with fixed distance or (iv) adaptive distance bandwidths (FB-GWRa), (v) eigenvector spatial filtering (ESF), and (vi) random effects ESF (RE-ESF). Results reveal that the SVC models designed to capture scale dependencies in local relationships (FB-GWR, FB-GWRa and RE-ESF) most accurately estimate the simulated SVCs, where RE-ESF is the most computationally efficient. Conversely GWR and ESF, where SVC estimates are naively assumed to operate at the same spatial scale for each relationship, perform poorly. Results also confirm that the adaptive bandwidth GWR models (GWRa and FB-GWRa) are superior to their fixed bandwidth counterparts (GWR and FB-GWR)

    Adaptive variance function estimation in heteroscedastic nonparametric regression

    Get PDF
    We consider a wavelet thresholding approach to adaptive variance function estimation in heteroscedastic nonparametric regression. A data-driven estimator is constructed by applying wavelet thresholding to the squared first-order differences of the observations. We show that the variance function estimator is nearly optimally adaptive to the smoothness of both the mean and variance functions. The estimator is shown to achieve the optimal adaptive rate of convergence under the pointwise squared error simultaneously over a range of smoothness classes. The estimator is also adaptively within a logarithmic factor of the minimax risk under the global mean integrated squared error over a collection of spatially inhomogeneous function classes. Numerical implementation and simulation results are also discussed.Comment: Published in at http://dx.doi.org/10.1214/07-AOS509 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Spatial aggregation of local likelihood estimates with applications to classification

    Get PDF
    This paper presents a new method for spatially adaptive local (constant) likelihood estimation which applies to a broad class of nonparametric models, including the Gaussian, Poisson and binary response models. The main idea of the method is, given a sequence of local likelihood estimates (``weak'' estimates), to construct a new aggregated estimate whose pointwise risk is of order of the smallest risk among all ``weak'' estimates. We also propose a new approach toward selecting the parameters of the procedure by providing the prescribed behavior of the resulting estimate in the simple parametric situation. We establish a number of important theoretical results concerning the optimality of the aggregated estimate. In particular, our ``oracle'' result claims that its risk is, up to some logarithmic multiplier, equal to the smallest risk for the given family of estimates. The performance of the procedure is illustrated by application to the classification problem. A numerical study demonstrates its reasonable performance in simulated and real-life examples.Comment: Published in at http://dx.doi.org/10.1214/009053607000000271 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sharp estimation in sup norm with random design

    Get PDF
    The aim of this paper is to recover the regression function with sup norm loss. We construct an asymptotically sharp estimator which converges with the spatially dependent rate r\_{n, \mu}(x) = P \big(\log n / (n \mu(x)) \big)^{s / (2s + 1)}, where ÎĽ\mu is the design density, ss the regression smoothness, nn the sample size and PP is a constant expressed in terms of a solution to a problem of optimal recovery as in Donoho (1994). We prove this result under the assumption that ÎĽ\mu is positive and continuous. This estimator combines kernel and local polynomial methods, where the kernel is given by optimal recovery, which allows to prove the result up to the constants for any s>0s > 0. Moreover, the estimator does not depend on ÎĽ\mu. We prove that r_n,ÎĽ(x)r\_{n, \mu}(x) is optimal in a sense which is stronger than the classical minimax lower bound. Then, an inhomogeneous confidence band is proposed. This band has a non constant length which depends on the local amount of data

    ON CHOOSING A BASE COVERAGE LEVEL FOR MULTIPLE PERIL CROP INSURANCE CONTRACTS

    Get PDF
    For multiple peril crop insurance, the U.S. Department of Agriculture'Â’s Risk Management Agency estimates the premium rate for a base coverage level and then uses multiplicative adjustment factors to recover rates at other coverage levels. Given this methodology, accurate estimation of the base coverage level from 65% to 50%. The purpose of this analysis was to provide some insight into whether such a change should or should not be carried out. Not surprisingly, our findings indicate that the higher coverage level should be maintained as the base.Risk and Uncertainty,

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Comparison of data-driven uncertainty quantification methods for a carbon dioxide storage benchmark scenario

    Full text link
    A variety of methods is available to quantify uncertainties arising with\-in the modeling of flow and transport in carbon dioxide storage, but there is a lack of thorough comparisons. Usually, raw data from such storage sites can hardly be described by theoretical statistical distributions since only very limited data is available. Hence, exact information on distribution shapes for all uncertain parameters is very rare in realistic applications. We discuss and compare four different methods tested for data-driven uncertainty quantification based on a benchmark scenario of carbon dioxide storage. In the benchmark, for which we provide data and code, carbon dioxide is injected into a saline aquifer modeled by the nonlinear capillarity-free fractional flow formulation for two incompressible fluid phases, namely carbon dioxide and brine. To cover different aspects of uncertainty quantification, we incorporate various sources of uncertainty such as uncertainty of boundary conditions, of conceptual model definitions and of material properties. We consider recent versions of the following non-intrusive and intrusive uncertainty quantification methods: arbitary polynomial chaos, spatially adaptive sparse grids, kernel-based greedy interpolation and hybrid stochastic Galerkin. The performance of each approach is demonstrated assessing expectation value and standard deviation of the carbon dioxide saturation against a reference statistic based on Monte Carlo sampling. We compare the convergence of all methods reporting on accuracy with respect to the number of model runs and resolution. Finally we offer suggestions about the methods' advantages and disadvantages that can guide the modeler for uncertainty quantification in carbon dioxide storage and beyond

    Bandwidth selection in kernel empirical risk minimization via the gradient

    Get PDF
    In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. The selection rule consists of a comparison of gradient empirical risks. It can be viewed as a nontrivial improvement of the so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one main advantage of our selection rule is the nondependency on the Hessian matrix of the risk, usually involved in standard adaptive procedures.Comment: Published at http://dx.doi.org/10.1214/15-AOS1318 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore