Search CORE

43,028 research outputs found

Robust estimation for zero-inflated Poisson regression

Author: Daniel B Hall
Jing Shen
Publication venue
Publication date: 01/01/2010
Field of study

ABSTRACT. The zero-inflated Poisson regression model is a special case of finite mixture models that is useful for count data containing many zeros. Typically, maximum likelihood (ML) estimation is used for fitting such models. However, it is well known that the ML estimator is highly sensitive to the presence of outliers and can become unstable when mixture components are poorly separated. In this paper, we propose an alternative robust estimation approach, robust expectation-solution (RES) estimation. We compare the RES approach with an existing robust approach, minimum Hellinger distance (MHD) estimation. Simulation results indicate that both methods improve on ML when outliers are present and/or when the mixture components are poorly separated. However, the RES approach is more efficient in all the scenarios we considered. In addition, the RES method is shown to yield consistent and asymptotically normal estimators and, in contrast to MHD, can be applied quite generally

CiteSeerX

Uncoupled isotonic regression via minimum Wasserstein deconvolution

Author: Rigollet Philippe
Weed Jonathan
Publication venue
Publication date: 24/03/2019
Field of study

Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown nondecreasing regression function

f

from independent pairs

(x_i, y_i)

where

\mathbb{E}[y_i]=f(x_i), i=1, \ldots n

. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given only the unordered sets

\{x_1, \ldots, x_n\}

and

\{y_1, \ldots, y_n\}

. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on

y_i

and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.Comment: To appear in Information and Inference: a Journal of the IM

arXiv.org e-Print Archive

DSpace@MIT

The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

Author: Anderlucci Laura
Montanari Angela
Viroli Cinzia
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing and by clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Moment-based Estimation of Mixtures of Regression Models

Author: Ekstrøm Claus Thorn
Pipper Christian Bressen
Publication venue
Publication date: 01/01/2019
Field of study

Finite mixtures of regression models provide a flexible modeling framework for many phenomena. Using moment-based estimation of the regression parameters, we develop unbiased estimators with a minimum of assumptions on the mixture components. In particular, only the average regression model for one of the components in the mixture model is needed and no requirements on the distributions. The consistency and asymptotic distribution of the estimators is derived and the proposed method is validated through a series of simulation studies and is shown to be highly accurate. We illustrate the use of the moment-based mixture of regression models with an application to wine quality data.Comment: 17 pages, 3 figure

arXiv.org e-Print Archive

Copenhagen University Research Information System

Modelling Background Noise in Finite Mixtures of Generalized Linear Regression Models

Author: Brito Paula
Leisch Friedrich
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2008
Field of study

In this paper we show how only a few outliers can completely break down EM-estimation of mixtures of regression models. A simple, yet very effective way of dealing with this problem, is to use a component where all regression parameters are fixed to zero to model the background noise. This noise component can be easily defined for different types of generalized linear models, has a familiar interpretation as the empty regression model, and is not very sensitive with respect to its own parameters

Open Access LMU

Research Online

Robust EM algorithm for model-based curve clustering

Author: Chamroukhi Faicel
Publication venue
Publication date: 25/12/2013
Field of study

Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), 2013, Dallas, TX, US

arXiv.org e-Print Archive

Crossref

Entropic optimal transport is maximum-likelihood deconvolution

Author: Rigollet Philippe
Weed Jonathan
Publication venue
Publication date: 01/01/2018
Field of study

We give a statistical interpretation of entropic optimal transport by showing that performing maximum-likelihood estimation for Gaussian deconvolution corresponds to calculating a projection with respect to the entropic optimal transport distance. This structural result gives theoretical support for the wide adoption of these tools in the machine learning community

arXiv.org e-Print Archive

DSpace@MIT

Comptes Rendus Mathématique

Numérisation de Documents Anciens Mathématiques