1,248 research outputs found

    Hypothesis test for normal mixture models: The EM approach

    Full text link
    Normal mixture distributions are arguably the most important mixture models, and also the most technically challenging. The likelihood function of the normal mixture model is unbounded based on a set of random samples, unless an artificial bound is placed on its component variance parameter. Moreover, the model is not strongly identifiable so it is hard to differentiate between over dispersion caused by the presence of a mixture and that caused by a large variance, and it has infinite Fisher information with respect to mixing proportions. There has been extensive research on finite normal mixture models, but much of it addresses merely consistency of the point estimation or useful practical procedures, and many results require undesirable restrictions on the parameter space. We show that an EM-test for homogeneity is effective at overcoming many challenges in the context of finite normal mixtures. We find that the limiting distribution of the EM-test is a simple function of the 0.5χ02+0.5χ120.5\chi^2_0+0.5\chi^2_1 and χ12\chi^2_1 distributions when the mixing variances are equal but unknown and the χ22\chi^2_2 when variances are unequal and unknown. Simulations show that the limiting distributions approximate the finite sample distribution satisfactorily. Two genetic examples are used to illustrate the application of the EM-test.Comment: Published in at http://dx.doi.org/10.1214/08-AOS651 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Likelihood Asymptotics in Nonregular Settings: A Review with Emphasis on the Likelihood Ratio

    Full text link
    This paper reviews the most common situations where one or more regularity conditions which underlie classical likelihood-based parametric inference fail. We identify three main classes of problems: boundary problems, indeterminate parameter problems -- which include non-identifiable parameters and singular information matrices -- and change-point problems. The review focuses on the large-sample properties of the likelihood ratio statistic. We emphasize analytical solutions and acknowledge software implementations where available. We furthermore give summary insight about the possible tools to derivate the key results. Other approaches to hypothesis testing and connections to estimation are listed in the annotated bibliography of the Supplementary Material

    Development in Normal Mixture and Mixture of Experts Modeling

    Get PDF
    In this dissertation, first we consider the problem of testing homogeneity and order in a contaminated normal model, when the data is correlated under some known covariance structure. To address this problem, we developed a moment based homogeneity and order test, and design weights for test statistics to increase power for homogeneity test. We applied our test to microarray about Down’s syndrome. This dissertation also studies a singular Bayesian information criterion (sBIC) for a bivariate hierarchical mixture model with varying weights, and develops a new data dependent information criterion (sFLIC).We apply our model and criteria to birth- weight and gestational age data for the same model, whose purposes are to select model complexity from data

    Inference and Application of Likelihood Based Methods for Hidden Markov Models

    Get PDF
    The thesis consists of three papers. In the paper “Testing for the number of states in hidden Markov models” we generalize existing testing procedures for i.i.d. mixture models to hidden Markov models by considering penalized quasi-likelihood ratio tests. They can be applied in order to assess the number of states k of a hidden Markov model with univariate state-dependent distribution fulfilling certain regularity conditions. In the paper “Hidden Markov Models with state-dependent mixtures” we analyze the dependence structure of hidden Markov models with state-dependent finite mixtures. Our results have applications to model selection as well as to model-based clustering. We propose algorithms for both purposes. In the paper “Peaks vs Components” we analyze welfare groups of countries all over the world by applying finite mixture models to the GDP per capita of 190 countries from 1970 to 2009

    Bivariate modelling of precipitation and temperature using a non-homogeneous hidden Markov model

    Full text link
    Aiming to generate realistic synthetic times series of the bivariate process of daily mean temperature and precipitations, we introduce a non-homogeneous hidden Markov model. The non-homogeneity lies in periodic transition probabilities between the hidden states, and time-dependent emission distributions. This enables the model to account for the non-stationary behaviour of weather variables. By carefully choosing the emission distributions, it is also possible to model the dependance structure between the two variables. The model is applied to several weather stations in Europe with various climates, and we show that it is able to simulate realistic bivariate time series

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Tests of homogeneity of several location and scale populations, and analysis of paired count data with zero-inflation and over-dispersion.

    Get PDF
    This thesis consists of two parts, referred as Part I and Part II. Part I. Testing homogeneity of several location-scale populations. The widely used method for testing homogeneity of several normal populations is to test the equality of means based on the assumption that the variances among different groups are same. But in practice, we often get data which are different not only in means but also in variances. Singh (1986) tests the homogeneity of several normal populations simultaneously regarding commonality of means and variances based on a method by Fisher (1950). However, this problem arises not only in normal populations but also in other populations. In this thesis, I extend Fisher\u27s method to location-scale models in general. The location-scale models encompass all two parameter mean-variance models, such as the normal, negative binomial and beta-binomial models. Two test statistics are developed, one of which is based on the combination of two likelihood ratio statistics and the other is based on the combination of two score test statistics. Theoretical and empirical properties of these procedures are studied and applied to real life data analysis problems. Part II. Analysis of paired count data with zero-inflation and over-dispersion. Data in the form of paired counts (pre-treatment and post-treatment counts) arise in many fields such as biomedical, toxicology, epidemiology and so on. Poisson and binomial models are the most widely used models for these data. Frequently encountered problems in these data are the presence of extra-zeros and extra-dispersion and, the possible correlation between the pre-treatment and post-treatment count. In this thesis I developed methods of analysis for two different sets of paired count data, one of the data set is obtained from an experiment on premature ventricular contractions (PVC) (Berry, 1987) and the other set is a dental epidemiology data representing decayed, missing and filled teeth (DMFT) index (Bohning, Dietz, Schlattmann, Mendonca and Kirchner, 1999). I then study properties of these methods and analyse the PVC data and the DMFT index data.Dept. of Mathematics and Statistics. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .J53. Source: Dissertation Abstracts International, Volume: 65-10, Section: B, page: 5219. Adviser: S. R. Paul. Thesis (Ph.D.)--University of Windsor (Canada), 2004
    • …
    corecore