64,005 research outputs found
Optimal linear estimation under unknown nonlinear transform
Linear regression studies the problem of estimating a model parameter
, from observations
from linear model . We consider a significant
generalization in which the relationship between and is noisy, quantized to a single bit, potentially nonlinear,
noninvertible, as well as unknown. This model is known as the single-index
model in statistics, and, among other things, it represents a significant
generalization of one-bit compressed sensing. We propose a novel spectral-based
estimation procedure and show that we can recover in settings (i.e.,
classes of link function ) where previous algorithms fail. In general, our
algorithm requires only very mild restrictions on the (unknown) functional
relationship between and . We also
consider the high dimensional setting where is sparse ,and introduce
a two-stage nonconvex framework that addresses estimation challenges in high
dimensional regimes where . For a broad class of link functions
between and , we establish minimax
lower bounds that demonstrate the optimality of our estimators in both the
classical and high dimensional regimes.Comment: 25 pages, 3 figure
Have Econometric Analyses of Happiness Data Been Futile? A Simple Truth About Happiness Scales
Econometric analyses in the happiness literature typically use subjective
well-being (SWB) data to compare the mean of observed or latent happiness
across samples. Recent critiques show that comparing the mean of ordinal data
is only valid under strong assumptions that are usually rejected by SWB data.
This leads to an open question whether much of the empirical studies in the
economics of happiness literature have been futile. In order to salvage some of
the prior results and avoid future issues, we suggest regression analysis of
SWB (and other ordinal data) should focus on the median rather than the mean.
Median comparisons using parametric models such as the ordered probit and logit
can be readily carried out using familiar statistical softwares like STATA. We
also show a previously assumed impractical task of estimating a semiparametric
median ordered-response model is also possible by using a novel constrained
mixed integer optimization technique. We use GSS data to show the famous
Easterlin Paradox from the happiness literature holds for the US independent of
any parametric assumption
Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models
Penalization of the likelihood by Jeffreys' invariant prior, or by a positive
power thereof, is shown to produce finite-valued maximum penalized likelihood
estimates in a broad class of binomial generalized linear models. The class of
models includes logistic regression, where the Jeffreys-prior penalty is known
additionally to reduce the asymptotic bias of the maximum likelihood estimator;
and also models with other commonly used link functions such as probit and
log-log. Shrinkage towards equiprobability across observations, relative to the
maximum likelihood estimator, is established theoretically and is studied
through illustrative examples. Some implications of finiteness and shrinkage
for inference are discussed, particularly when inference is based on Wald-type
procedures. A widely applicable procedure is developed for computation of
maximum penalized likelihood estimates, by using repeated maximum likelihood
fits with iteratively adjusted binomial responses and totals. These theoretical
results and methods underpin the increasingly widespread use of reduced-bias
and similarly penalized binomial regression models in many applied fields
Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology
Presence-only data, point locations where a species has been recorded as
being present, are often used in modeling the distribution of a species as a
function of a set of explanatory variables---whether to map species occurrence,
to understand its association with the environment, or to predict its response
to environmental change. Currently, ecologists most commonly analyze
presence-only data by adding randomly chosen "pseudo-absences" to the data such
that it can be analyzed using logistic regression, an approach which has
weaknesses in model specification, in interpretation, and in implementation. To
address these issues, we propose Poisson point process modeling of the
intensity of presences. We also derive a link between the proposed approach and
logistic regression---specifically, we show that as the number of
pseudo-absences increases (in a regular or uniform random arrangement),
logistic regression slope parameters and their standard errors converge to
those of the corresponding Poisson point process model. We discuss the
practical implications of these results. In particular, point process modeling
offers a framework for choice of the number and location of pseudo-absences,
both of which are currently chosen by ad hoc and sometimes ineffective methods
in ecology, a point which we illustrate by example.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS331 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On the Properties of Simulation-based Estimators in High Dimensions
Considering the increasing size of available data, the need for statistical
methods that control the finite sample bias is growing. This is mainly due to
the frequent settings where the number of variables is large and allowed to
increase with the sample size bringing standard inferential procedures to incur
significant loss in terms of performance. Moreover, the complexity of
statistical models is also increasing thereby entailing important computational
challenges in constructing new estimators or in implementing classical ones. A
trade-off between numerical complexity and statistical properties is often
accepted. However, numerically efficient estimators that are altogether
unbiased, consistent and asymptotically normal in high dimensional problems
would generally be ideal. In this paper, we set a general framework from which
such estimators can easily be derived for wide classes of models. This
framework is based on the concepts that underlie simulation-based estimation
methods such as indirect inference. The approach allows various extensions
compared to previous results as it is adapted to possibly inconsistent
estimators and is applicable to discrete models and/or models with a large
number of parameters. We consider an algorithm, namely the Iterative Bootstrap
(IB), to efficiently compute simulation-based estimators by showing its
convergence properties. Within this framework we also prove the properties of
simulation-based estimators, more specifically the unbiasedness, consistency
and asymptotic normality when the number of parameters is allowed to increase
with the sample size. Therefore, an important implication of the proposed
approach is that it allows to obtain unbiased estimators in finite samples.
Finally, we study this approach when applied to three common models, namely
logistic regression, negative binomial regression and lasso regression
- …