Generalized Identifiability Bounds for Mixture Models with Grouped Samples

Abstract

Recent work has shown that finite mixture models with mm components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size 2m12m-1 which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of kk mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only (2m1)/(k1)(2m-1)/(k-1) samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a kk-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling

    Similar works

    Full text

    thumbnail-image

    Available Versions