7,862 research outputs found

    Recovering Structured Probability Matrices

    Get PDF
    We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient algorithm that accurately recovers the underlying M by M matrix using Theta(M) samples. This result easily translates to Theta(M) sample algorithms for learning topic models and learning hidden Markov Models. These linear sample complexities are optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. We provide an even stronger lower bound where distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by an HMM with two hidden states requires Omega(M) observations. This precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs

    Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

    Full text link
    Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate

    Quadratically-Regularized Optimal Transport on Graphs

    Full text link
    Optimal transportation provides a means of lifting distances between points on a geometric domain to distances between signals over the domain, expressed as probability distributions. On a graph, transportation problems can be used to express challenging tasks involving matching supply to demand with minimal shipment expense; in discrete language, these become minimum-cost network flow problems. Regularization typically is needed to ensure uniqueness for the linear ground distance case and to improve optimization convergence; state-of-the-art techniques employ entropic regularization on the transportation matrix. In this paper, we explore a quadratic alternative to entropic regularization for transport over a graph. We theoretically analyze the behavior of quadratically-regularized graph transport, characterizing how regularization affects the structure of flows in the regime of small but nonzero regularization. We further exploit elegant second-order structure in the dual of this problem to derive an easily-implemented Newton-type optimization algorithm.Comment: 27 page

    Regularization and Bayesian Learning in Dynamical Systems: Past, Present and Future

    Full text link
    Regularization and Bayesian methods for system identification have been repopularized in the recent years, and proved to be competitive w.r.t. classical parametric approaches. In this paper we shall make an attempt to illustrate how the use of regularization in system identification has evolved over the years, starting from the early contributions both in the Automatic Control as well as Econometrics and Statistics literature. In particular we shall discuss some fundamental issues such as compound estimation problems and exchangeability which play and important role in regularization and Bayesian approaches, as also illustrated in early publications in Statistics. The historical and foundational issues will be given more emphasis (and space), at the expense of the more recent developments which are only briefly discussed. The main reason for such a choice is that, while the recent literature is readily available, and surveys have already been published on the subject, in the author's opinion a clear link with past work had not been completely clarified.Comment: Plenary Presentation at the IFAC SYSID 2015. Submitted to Annual Reviews in Contro

    ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly

    Full text link
    Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model learned, significantly beats all factorization approaches to matrix approximation. Even more surprisingly, we find that summing over small co-clusterings is more effective in modeling matrices than classic co-clustering, which uses just one large partitioning of the matrix. Following Occam's razor principle suggests that the simple structure induced by our model better captures the latent preferences and decision making processes present in the real world than classic co-clustering or matrix factorization. We provide an iterative minimization algorithm, a collapsed Gibbs sampler, theoretical guarantees for matrix approximation, and excellent empirical evidence for the efficacy of our approach. We achieve state-of-the-art results on the Netflix problem with a fraction of the model complexity.Comment: 22 pages, under review for conference publicatio
    • …
    corecore