1 research outputs found
Streaming Coresets for Symmetric Tensor Factorization
Factorizing tensors has recently become an important optimization module in a
number of machine learning pipelines, especially in latent variable models. We
show how to do this efficiently in the streaming setting. Given a set of
vectors, each in , we present algorithms to select a sublinear
number of these vectors as coreset, while guaranteeing that the CP
decomposition of the -moment tensor of the coreset approximates the
corresponding decomposition of the -moment tensor computed from the full
data. We introduce two novel algorithmic techniques: online filtering and
kernelization. Using these two, we present six algorithms that achieve
different tradeoffs of coreset size, update time and working space, beating or
matching various state of the art algorithms. In the case of matrices
(-ordered tensor), our online row sampling algorithm guarantees relative error spectral approximation. We show applications of our
algorithms in learning single topic modeling.Comment: Accepted at ICML 2020. Included algorithm with improved update time
and fixed minor bug