141 research outputs found
Fast and Guaranteed Tensor Decomposition via Sketching
Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in
statistical learning of latent variable models and in data mining. In this
paper, we propose fast and randomized tensor CP decomposition algorithms based
on sketching. We build on the idea of count sketches, but introduce many novel
ideas which are unique to tensors. We develop novel methods for randomized
computation of tensor contractions via FFTs, without explicitly forming the
tensors. Such tensor contractions are encountered in decomposition methods such
as tensor power iterations and alternating least squares. We also design novel
colliding hashes for symmetric tensors to further save time in computing the
sketches. We then combine these sketching ideas with existing whitening and
tensor power iterative techniques to obtain the fastest algorithm on both
sparse and dense tensors. The quality of approximation under our method does
not depend on properties such as sparsity, uniformity of elements, etc. We
apply the method for topic modeling and obtain competitive results.Comment: 29 pages. Appeared in Proceedings of Advances in Neural Information
Processing Systems (NIPS), held at Montreal, Canada in 201
Bayesian Methods in Tensor Analysis
Tensors, also known as multidimensional arrays, are useful data structures in
machine learning and statistics. In recent years, Bayesian methods have emerged
as a popular direction for analyzing tensor-valued data since they provide a
convenient way to introduce sparsity into the model and conduct uncertainty
quantification. In this article, we provide an overview of frequentist and
Bayesian methods for solving tensor completion and regression problems, with a
focus on Bayesian methods. We review common Bayesian tensor approaches
including model formulation, prior assignment, posterior computation, and
theoretical properties. We also discuss potential future directions in this
field.Comment: 32 pages, 8 figures, 2 table
Structured Mixture Models
Finite mixture models are a staple of model-based clustering approaches for distinguishing subgroups. A common mixture model is the finite Gaussian mixture model, whose degrees of freedom scales quadratically with increasing data dimension. Methods in the literature often tackle the degrees of freedom of the Gaussian mixture model by sharing parameters between the eigendecomposition of covariance matrices across all mixture components. We posit finite Gaussian mixture models with alternate forms of parameter sharing by imposing additional structure on the parameters, such as sharing parameters with other components as a convex combination of the corresponding parent components or by imposing a sequence of hierarchical clustering structure in orthogonal subspaces with common parameters across levels. Estimation procedures using the Expectation-Maximization (EM) algorithm are derived throughout, with application to simulated and real-world datasets. As well, the proposed model structures have an interpretable meaning that can shed light on clustering analyses performed by practitioners in the context of their data.
The EM algorithm is a popular estimation method for tackling issues of latent data, such as in finite mixture models where component memberships are often latent. One aspect of the EM algorithm that hampers estimation is a slow rate of convergence, which affects the estimation of finite Gaussian mixture models. To explore avenues of improvement, we explore the extrapolation of the sequence of conditional expectations admitting general EM procedures, with minimal modifications for many common models. With the same mindset of accelerating iterative algorithms, we also examine the use of approximate sketching methods in estimating generalized linear models via iteratively re-weighted least squares, with emphasis on practical data infrastructure constraints. We propose a sketching method that controls for both data transfer and computation costs, the former of which is often overlooked in asymptotic complexity analyses, and are able to achieve an approximate result in much faster wall-clock time compared to the exact solution on real-world hardware, and can estimate standard errors in addition to point estimates
Marriages of Mathematics and Physics: A Challenge for Biology
The human attempts to access, measure and organize physical phenomena have led to a manifold construction of mathematical and physical spaces. We will survey the evolution of geometries from Euclid to the Algebraic Geometry of the 20th century. The role of Persian/Arabic Algebra in this transition and its Western symbolic development is emphasized. In this relation, we will also discuss changes in the ontological attitudes toward mathematics and its applications. Historically, the encounter of geometric and algebraic perspectives enriched the mathematical practices and their foundations. Yet, the collapse of Euclidean certitudes, of over 2300 years, and the crisis in the mathematical analysis of the 19th century, led to the exclusion of “geometric judgments” from the foundations of Mathematics. After the success and the limits of the logico-formal analysis, it is necessary to broaden our foundational tools and re-examine the interactions with natural sciences. In particular, the way the geometric and algebraic approaches organize knowledge is analyzed as a cross-disciplinary and cross-cultural issue and will be examined in Mathematical Physics and Biology. We finally discuss how the current notions of mathematical (phase) “space” should be revisited for the purposes of life sciences
- …