1,574 research outputs found

    Unbiased estimators for random design regression

    Full text link
    In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over d-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the sample is drawn i.i.d. from the input distribution, the least squares solution for that sample can be viewed as the natural estimator of the optimum. Unfortunately, this estimator almost always incurs an undesirable bias coming from the randomness of the input points. In this paper we show that it is possible to draw a non-i.i.d. sample of input points such that, regardless of the response model, the least squares solution is an unbiased estimator of the optimum. Moreover, this sample can be produced efficiently by augmenting a previously drawn i.i.d. sample with an additional set of d points drawn jointly from the input distribution rescaled by the squared volume spanned by the points. Motivated by this, we develop a theoretical framework for studying volume-rescaled sampling, and in the process prove a number of new matrix expectation identities. We use them to show that for any input distribution and ϵ>0\epsilon>0 there is a random design consisting of O(dlogd+d/ϵ)O(d\log d+ d/\epsilon) points from which an unbiased estimator can be constructed whose square loss over the entire distribution is with high probability bounded by 1+ϵ1+\epsilon times the loss of the optimum. We provide efficient algorithms for generating such unbiased estimators in a number of practical settings and support our claims experimentally

    Error-free milestones in error prone measurements

    Get PDF
    A predictor variable or dose that is measured with substantial error may possess an error-free milestone, such that it is known with negligible error whether the value of the variable is to the left or right of the milestone. Such a milestone provides a basis for estimating a linear relationship between the true but unknown value of the error-free predictor and an outcome, because the milestone creates a strong and valid instrumental variable. The inferences are nonparametric and robust, and in the simplest cases, they are exact and distribution free. We also consider multiple milestones for a single predictor and milestones for several predictors whose partial slopes are estimated simultaneously. Examples are drawn from the Wisconsin Longitudinal Study, in which a BA degree acts as a milestone for sixteen years of education, and the binary indicator of military service acts as a milestone for years of service.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS233 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A generalized nonlinear mixed-effects height-diameter model for Eucalyptus globulus L. in northwestern Spain

    Get PDF
    A generalized height–diameter model was developed for Eucalyptus globulus Labill. stands in Galicia (northwestern Spain). The study involved a variety of pure stands ranging from even-aged to unevenaged. Data were obtained from permanent circular sample plots in which trees were sampled within different radii according to their diameter at breast height. A combination ofweighted regression, to take into account the unequal selection probabilities of such an inventory design, and mixed model techniques, to accommodate local random fluctuations in the height–diameter relationship, were applied to estimate fixed and random parameters for several models reported in the relevant literature. The models that provided the best results included dominant height and dominant diameter as fixed effects. These models explained more than 83% of the observed variability, with mean errors of less than 2.5 m. Random parameters for particular plots were estimated with different tree selection options. Height–diameter relationships tailored to individual plots can be obtained by calibration of the height measurements of the three smallest trees in a plot. An independent dataset was used to test the performance of themodel with data not used in the fitting process, and to demonstrate the advantages of calibrating the mixed-effects model

    Statistical Estimation and Inference Improvements for Exoplanet Discovery

    Get PDF
    The radial velocity method has been widely used by astronomers since the 1990\u27s for discovering extra-solar planets, often referred to as simply exoplanets . This method involves estimating the radial velocity of a distant star over time using the stellar light, followed by modeling such radial velocity estimates as a function of time using Keplerian-orbital equations with parameters that describe the exoplanet. While a number of approaches exist for estimating the radial velocity from the stellar light, we introduce a new approach for this that uses Hermite-Gaussian functions to reduce the estimation to linear least-squares regression. Furthermore, we demonstrate that this new approach, compared to the commonly used cross-correlation approach, provides an approximate 21% reduction of statistical risk in simulation studies as well as in applications to recently collected data. We then extend this linear model to include additional terms that represent the effect of stellar activity on the observed light, an effect known to both hide and imitate the signal of exoplanets. The F-statistic for the fitted coefficients of these additional terms is found to have higher statistical power than many traditional stellar activity indicators at detecting the presence of stellar activity. Finally, we also use the linear model in a Bayesian framework to merge both traditional steps of the radial velocity method into one that estimates the exoplanet\u27s orbital parameters directly from the time series of observed stellar light
    corecore