64,005 research outputs found

    Optimal linear estimation under unknown nonlinear transform

    Full text link
    Linear regression studies the problem of estimating a model parameter β∗∈Rp\beta^* \in \mathbb{R}^p, from nn observations {(yi,xi)}i=1n\{(y_i,\mathbf{x}_i)\}_{i=1}^n from linear model yi=⟨xi,β∗⟩+ϵiy_i = \langle \mathbf{x}_i,\beta^* \rangle + \epsilon_i. We consider a significant generalization in which the relationship between ⟨xi,β∗⟩\langle \mathbf{x}_i,\beta^* \rangle and yiy_i is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing. We propose a novel spectral-based estimation procedure and show that we can recover β∗\beta^* in settings (i.e., classes of link function ff) where previous algorithms fail. In general, our algorithm requires only very mild restrictions on the (unknown) functional relationship between yiy_i and ⟨xi,β∗⟩\langle \mathbf{x}_i,\beta^* \rangle. We also consider the high dimensional setting where β∗\beta^* is sparse ,and introduce a two-stage nonconvex framework that addresses estimation challenges in high dimensional regimes where p≫np \gg n. For a broad class of link functions between ⟨xi,β∗⟩\langle \mathbf{x}_i,\beta^* \rangle and yiy_i, we establish minimax lower bounds that demonstrate the optimality of our estimators in both the classical and high dimensional regimes.Comment: 25 pages, 3 figure

    Have Econometric Analyses of Happiness Data Been Futile? A Simple Truth About Happiness Scales

    Full text link
    Econometric analyses in the happiness literature typically use subjective well-being (SWB) data to compare the mean of observed or latent happiness across samples. Recent critiques show that comparing the mean of ordinal data is only valid under strong assumptions that are usually rejected by SWB data. This leads to an open question whether much of the empirical studies in the economics of happiness literature have been futile. In order to salvage some of the prior results and avoid future issues, we suggest regression analysis of SWB (and other ordinal data) should focus on the median rather than the mean. Median comparisons using parametric models such as the ordered probit and logit can be readily carried out using familiar statistical softwares like STATA. We also show a previously assumed impractical task of estimating a semiparametric median ordered-response model is also possible by using a novel constrained mixed integer optimization technique. We use GSS data to show the famous Easterlin Paradox from the happiness literature holds for the US independent of any parametric assumption

    Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models

    Get PDF
    Penalization of the likelihood by Jeffreys' invariant prior, or by a positive power thereof, is shown to produce finite-valued maximum penalized likelihood estimates in a broad class of binomial generalized linear models. The class of models includes logistic regression, where the Jeffreys-prior penalty is known additionally to reduce the asymptotic bias of the maximum likelihood estimator; and also models with other commonly used link functions such as probit and log-log. Shrinkage towards equiprobability across observations, relative to the maximum likelihood estimator, is established theoretically and is studied through illustrative examples. Some implications of finiteness and shrinkage for inference are discussed, particularly when inference is based on Wald-type procedures. A widely applicable procedure is developed for computation of maximum penalized likelihood estimates, by using repeated maximum likelihood fits with iteratively adjusted binomial responses and totals. These theoretical results and methods underpin the increasingly widespread use of reduced-bias and similarly penalized binomial regression models in many applied fields

    Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology

    Full text link
    Presence-only data, point locations where a species has been recorded as being present, are often used in modeling the distribution of a species as a function of a set of explanatory variables---whether to map species occurrence, to understand its association with the environment, or to predict its response to environmental change. Currently, ecologists most commonly analyze presence-only data by adding randomly chosen "pseudo-absences" to the data such that it can be analyzed using logistic regression, an approach which has weaknesses in model specification, in interpretation, and in implementation. To address these issues, we propose Poisson point process modeling of the intensity of presences. We also derive a link between the proposed approach and logistic regression---specifically, we show that as the number of pseudo-absences increases (in a regular or uniform random arrangement), logistic regression slope parameters and their standard errors converge to those of the corresponding Poisson point process model. We discuss the practical implications of these results. In particular, point process modeling offers a framework for choice of the number and location of pseudo-absences, both of which are currently chosen by ad hoc and sometimes ineffective methods in ecology, a point which we illustrate by example.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS331 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the Properties of Simulation-based Estimators in High Dimensions

    Full text link
    Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase with the sample size bringing standard inferential procedures to incur significant loss in terms of performance. Moreover, the complexity of statistical models is also increasing thereby entailing important computational challenges in constructing new estimators or in implementing classical ones. A trade-off between numerical complexity and statistical properties is often accepted. However, numerically efficient estimators that are altogether unbiased, consistent and asymptotically normal in high dimensional problems would generally be ideal. In this paper, we set a general framework from which such estimators can easily be derived for wide classes of models. This framework is based on the concepts that underlie simulation-based estimation methods such as indirect inference. The approach allows various extensions compared to previous results as it is adapted to possibly inconsistent estimators and is applicable to discrete models and/or models with a large number of parameters. We consider an algorithm, namely the Iterative Bootstrap (IB), to efficiently compute simulation-based estimators by showing its convergence properties. Within this framework we also prove the properties of simulation-based estimators, more specifically the unbiasedness, consistency and asymptotic normality when the number of parameters is allowed to increase with the sample size. Therefore, an important implication of the proposed approach is that it allows to obtain unbiased estimators in finite samples. Finally, we study this approach when applied to three common models, namely logistic regression, negative binomial regression and lasso regression
    • …
    corecore