325 research outputs found

    Nonparametric Identification of Multivariate Mixtures

    Get PDF
    This article analyzes the identifiability of k-variate, M-component finite mixture models in which each component distribution has independent marginals, including models in latent class analysis. Without making parametric assumptions on the component distributions, we investigate how one can identify the number of components and the component distributions from the distribution function of the observed data. We reveal an important link between the number of variables (k), the number of values each variable can take, and the number of identifiable components. A lower bound on the number of components (M) is nonparametrically identifiable if k >= 2, and the maximum identifiable number of components is determined by the number of different values each variable takes. When M is known, the mixing proportions and the component distributions are nonparametrically identified from matrices constructed from the distribution function of the data if (i) k >= 3, (ii) two of k variables take at least M different values, and (iii) these matrices satisfy some rank and eigenvalue conditions. For the unknown M case, we propose an algorithm that possibly identifies M and the component distributions from data. We discuss a condition for nonparametric identi fication and its observable implications. In case M cannot be identified, we use our identification condition to develop a procedure that consistently estimates a lower bound on the number of components by estimating the rank of a matrix constructed from the distribution function of observed variables.finite mixture, latent class analysis, latent class model, model selection, number of components, rank estimation

    Nonparametric Identification and Estimation of Multivariate Mixtures

    Get PDF
    We study nonparametric identifiability of finite mixture models of k-variate data with M subpopulations, in which the components of the data vector are independent conditional on belonging to a subpopulation. We provide a sufficient condition for nonparametrically identifying M subpopulations when k>=3. Our focus is on the relationship between the number of values the components of the data vector can take on, and the number of identifiable subpopulations. Intuition would suggest that if the data vector can take many different values, then combining information from these different values helps identification. Hall and Zhou (2003) show, however, when k=2, two-component finite mixture models are not nonparametrically identifiable regardless of the number of the values the data vector can take. When k>=3, there emerges a link between the variation in the data vector, and the number of identifiable subpopulations: the number of identifiable subpopulations increases as the data vector takes on additional (different) values. This points to the possibility of identifying many components even when k=3, if the data vector has a continuously distributed element. Our identification method is constructive, and leads to an estimation strategy. It is not as efficient as the MLE, but can be used as the initial value of the optimization algorithm in computing the MLE. We also provide a sufficient condition for identifying the number of nonparametrically identifiable components, and develop a method for statistically testing and consistently estimating the number of nonparametrically identifiable components. We extend these procedures to develop a test for the number of components in binomial mixtures.finite mixture, binomial mixture, model selection, number of components, rank estimation
    corecore