1,574 research outputs found
Unbiased estimators for random design regression
In linear regression we wish to estimate the optimum linear least squares
predictor for a distribution over d-dimensional input points and real-valued
responses, based on a small sample. Under standard random design analysis,
where the sample is drawn i.i.d. from the input distribution, the least squares
solution for that sample can be viewed as the natural estimator of the optimum.
Unfortunately, this estimator almost always incurs an undesirable bias coming
from the randomness of the input points. In this paper we show that it is
possible to draw a non-i.i.d. sample of input points such that, regardless of
the response model, the least squares solution is an unbiased estimator of the
optimum. Moreover, this sample can be produced efficiently by augmenting a
previously drawn i.i.d. sample with an additional set of d points drawn jointly
from the input distribution rescaled by the squared volume spanned by the
points. Motivated by this, we develop a theoretical framework for studying
volume-rescaled sampling, and in the process prove a number of new matrix
expectation identities. We use them to show that for any input distribution and
there is a random design consisting of
points from which an unbiased estimator can be constructed whose square loss
over the entire distribution is with high probability bounded by
times the loss of the optimum. We provide efficient algorithms for generating
such unbiased estimators in a number of practical settings and support our
claims experimentally
Error-free milestones in error prone measurements
A predictor variable or dose that is measured with substantial error may
possess an error-free milestone, such that it is known with negligible error
whether the value of the variable is to the left or right of the milestone.
Such a milestone provides a basis for estimating a linear relationship between
the true but unknown value of the error-free predictor and an outcome, because
the milestone creates a strong and valid instrumental variable. The inferences
are nonparametric and robust, and in the simplest cases, they are exact and
distribution free. We also consider multiple milestones for a single predictor
and milestones for several predictors whose partial slopes are estimated
simultaneously. Examples are drawn from the Wisconsin Longitudinal Study, in
which a BA degree acts as a milestone for sixteen years of education, and the
binary indicator of military service acts as a milestone for years of service.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS233 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A generalized nonlinear mixed-effects height-diameter model for Eucalyptus globulus L. in northwestern Spain
A generalized height–diameter model was developed for Eucalyptus globulus Labill. stands in Galicia
(northwestern Spain). The study involved a variety of pure stands ranging from even-aged to unevenaged.
Data were obtained from permanent circular sample plots in which trees were sampled within
different radii according to their diameter at breast height. A combination ofweighted regression, to take
into account the unequal selection probabilities of such an inventory design, and mixed model
techniques, to accommodate local random fluctuations in the height–diameter relationship, were
applied to estimate fixed and random parameters for several models reported in the relevant literature.
The models that provided the best results included dominant height and dominant diameter as fixed
effects. These models explained more than 83% of the observed variability, with mean errors of less than
2.5 m. Random parameters for particular plots were estimated with different tree selection options.
Height–diameter relationships tailored to individual plots can be obtained by calibration of the height
measurements of the three smallest trees in a plot. An independent dataset was used to test the
performance of themodel with data not used in the fitting process, and to demonstrate the advantages of
calibrating the mixed-effects model
Statistical Estimation and Inference Improvements for Exoplanet Discovery
The radial velocity method has been widely used by astronomers since the 1990\u27s for discovering extra-solar planets, often referred to as simply exoplanets . This method involves estimating the radial velocity of a distant star over time using the stellar light, followed by modeling such radial velocity estimates as a function of time using Keplerian-orbital equations with parameters that describe the exoplanet. While a number of approaches exist for estimating the radial velocity from the stellar light, we introduce a new approach for this that uses Hermite-Gaussian functions to reduce the estimation to linear least-squares regression. Furthermore, we demonstrate that this new approach, compared to the commonly used cross-correlation approach, provides an approximate 21% reduction of statistical risk in simulation studies as well as in applications to recently collected data. We then extend this linear model to include additional terms that represent the effect of stellar activity on the observed light, an effect known to both hide and imitate the signal of exoplanets. The F-statistic for the fitted coefficients of these additional terms is found to have higher statistical power than many traditional stellar activity indicators at detecting the presence of stellar activity. Finally, we also use the linear model in a Bayesian framework to merge both traditional steps of the radial velocity method into one that estimates the exoplanet\u27s orbital parameters directly from the time series of observed stellar light
- …