1 research outputs found
Using Multiple Samples to Learn Mixture Models
In the mixture models problem it is assumed that there are distributions
and one gets to observe a sample from a mixture
of these distributions with unknown coefficients. The goal is to associate
instances with their generating distributions, or to identify the parameters of
the hidden distributions. In this work we make the assumption that we have
access to several samples drawn from the same underlying distributions, but
with different mixing weights. As with topic modeling, having multiple samples
is often a reasonable assumption. Instead of pooling the data into one sample,
we prove that it is possible to use the differences between the samples to
better recover the underlying structure. We present algorithms that recover the
underlying structure under milder assumptions than the current state of art
when either the dimensionality or the separation is high. The methods, when
applied to topic modeling, allow generalization to words not present in the
training data.Comment: Published in Neural Information Processing Systems (NIPS) 201