46 research outputs found

    Pivotal estimation in high-dimensional regression via linear programming

    Full text link
    We propose a new method of estimation in high-dimensional linear regression model. It allows for very weak distributional assumptions including heteroscedasticity, and does not require the knowledge of the variance of random errors. The method is based on linear programming only, so that its numerical implementation is faster than for previously known techniques using conic programs, and it allows one to deal with higher dimensional models. We provide upper bounds for estimation and prediction errors of the proposed estimator showing that it achieves the same rate as in the more restrictive situation of fixed design and i.i.d. Gaussian errors with known variance. Following Gautier and Tsybakov (2011), we obtain the results under weaker sensitivity assumptions than the restricted eigenvalue or assimilated conditions

    Empirical Phi-Discrepancies and Quasi-Empirical Likelihood: Exponential Bounds

    Get PDF
    We review some recent extensions of the so-called generalized empirical likelihood method, when the Kullback distance is replaced by some general convex divergence. We propose to use, instead of empirical likelihood, some regularized form or quasi-empirical likelihood method, corresponding to a convex combination of Kullback and χ2 discrepancies. We show that for some adequate choice of the weight in this combination, the corresponding quasi-empirical likelihood is Bartlett-correctable. We also establish some non-asymptotic exponential bounds for the confidence regions obtained by using this method. These bounds are derived via bounds for self-normalized sums in the multivariate case obtained in a previous work by the authors. We also show that this kind of results may be extended to process valued infinite dimensional parameters. In this case some known results about self-normalized processes may be used to control the behavior of generalized empirical likelihood

    Effects of front-of-pack labels on the nutritional quality of supermarket food purchases: evidence from a large-scale randomized controlled trial

    Get PDF
    To examine whether four pre-selected front-of-pack nutrition labels improve food purchases in real-life grocery shopping settings, we put 1.9 million labels on 1,266 food products in four categories in 60 supermarkets and analyzed the nutritional quality of 1,668,301 purchases using the FSA nutrient profiling score. Effect sizes were 17 times smaller on average than those found in comparable laboratory studies. The most effective nutrition label, Nutri-Score, increased the purchases of foods in the top third of their category nutrition-wise by 14%, but had no impact on the purchases of foods with medium, low, or unlabeled nutrition quality. Therefore, Nutri-Score only improved the nutritional quality of the basket of labeled foods purchased by 2.5% (-0.142 FSA points). Nutri-Score’s performance improved with the variance (but not the mean) of the nutritional quality of the category. In-store surveys suggest that Nutri-Score’s ability to attract attention and help shoppers rank products by nutritional quality may explain its performance

    Estimation of Piecewise-Deterministic Trajectories in a Quantum Optics Scenario

    Get PDF
    The manipulation of individual copies of quantum systems is one of the most groundbreaking experimental discoveries in the field of quantum physics. On both an experimental and a theoretical level, it has been shown that the dynamics of a single copy of an open quantum system is a trajectory of a piecewise-deterministic process. To the best of our knowledge, this application field has not been explored by the literature in applied mathematics, from both probabilistic and statistical perspectives. The objective of this chapter is to provide a self-contained presentation of this kind of model, as well as its specificities in terms of observations scheme of the system, and a first attempt to deal with a statistical issue that arises in the quantum world

    Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning

    Get PDF
    The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine learning problems whose objective function is nicely separable across individual data points, such as classification and regression. In contrast, statistical learning tasks involving pairs (or more generally tuples) of data points - such as metric learning, clustering or ranking do not lend themselves as easily to data-parallelism and in-memory computing. In this paper, we investigate how to balance between statistical performance and computational efficiency in such distributed tuplewise statistical problems. We first propose a simple strategy based on occasionally repartitioning data across workers between parallel computation stages, where the number of repartitioning steps rules the trade-off between accuracy and runtime. We then present some theoretical results highlighting the benefits brought by the proposed method in terms of variance reduction, and extend our results to design distributed stochastic gradient descent algorithms for tuplewise empirical risk minimization. Our results are supported by numerical experiments in pairwise statistical estimation and learning on synthetic and real-world datasets.Comment: 23 pages, 6 figures, ECML 201

    Cost-Sensitive Regularization for Diabetic Retinopathy Grading from Eye Fundus Images.

    Get PDF
    Assessing the degree of disease severity in biomedical images is a task similar to standard classification but constrained by an underlying structure in the label space. Such a structure reflects the monotonic relationship between different disease grades. In this paper, we propose a straightforward approach to enforce this constraint for the task of predicting Diabetic Retinopathy (DR) severity from eye fundus images based on the well-known notion of Cost-Sensitive classification. We expand standard classification losses with an extra term that acts as a regularizer, imposing greater penalties on predicted grades when they are farther away from the true grade associated to a particular image. Furthermore, we show how to adapt our method to the modelling of label noise in each of the sub-problems associated to DR grading, an approach we refer to as Atomic Sub-Task modeling. This yields models that can implicitly take into account the inherent noise present in DR grade annotations. Our experimental analysis on several public datasets reveals that, when a standard Convolutional Neural Network is trained using this simple strategy, improvements of 3- 5% of quadratic-weighted kappa scores can be achieved at a negligible computational cost. Code to reproduce our results is released at github.com/agaldran/cost_sensitive_loss_classification

    Concentration inequalities for Harris recurrent Markov chains

    No full text
    International audienc

    Estimation et prĂ©cision dans les enquĂȘtes rĂ©pĂ©tĂ©es

    No full text
    SĂ©rie des documents de travail du CORELA et de HEDM ; 9609 Diffusion du document : INRA Station d'Economie et Sociologie rurales 65 boulevard de Brandebourg 94205 Ivry Cedex (FRA) Cote de localisation : PAR. BRT. 1996/P0698 96-09Ce document prĂ©sente les mĂ©thodes d'estimation et de calcul de prĂ©cision dans les enquĂȘtes rĂ©pĂ©tĂ©es, en particulier dans les panels (avec ou sans rotations). AprĂšs un examen de la littĂ©rature sur le sujet et une prĂ©sentation de diffĂ©rentes approches, classique ou basĂ©e sur des modĂšles, il est montrĂ© que dans de trĂšs nombreuses situations, les modĂšles sont parfaitement adaptĂ©s Ă  la prise en compte de l'erreur d'Ă©chantillonnage et de la dynamique des paramĂštres. Ils permettent en particulier d'Ă©tudier la dynamique d'un paramĂštre sur une longue pĂ©riode. Une modĂ©lisation sous forme espace-Ă©tat du panel de consommation Secodip est proposĂ©e et appliquĂ©e Ă  la consommation d'huile d'olive
    corecore