Search CORE

7,454 research outputs found

Minimum description length as an objective function for non-negative matrix factorization

Author: Bennett Adam Prugel
Niranjan Mahesan
Squires Steven
Publication venue
Publication date: 05/02/2019
Field of study

Non-negative matrix factorization (NMF) is a dimensionality reduction technique which tends to produce a sparse representation of data. Commonly, the error between the actual and recreated matrices is used as an objective function, but this method may not produce the type of representation we desire as it allows for the complexity of the model to grow, constrained only by the size of the subspace and the non-negativity requirement. If additional constraints, such as sparsity, are imposed the question of parameter selection becomes critical. Instead of adding sparsity constraints in an ad-hoc manner we propose a novel objective function created by using the principle of minimum description length (MDL). Our formulation, MDL-NMF, automatically trades off between the complexity and accuracy of the model using a principled approach with little parameter selection or the need for domain expertise. We demonstrate our model works effectively on three heterogeneous data-sets and on a range of semi-synthetic data showing the broad applicability of our method

arXiv.org e-Print Archive

Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey

Author: Zhou Yang
Publication venue
Publication date: 29/11/2011
Field of study

Probabilistic graphical models combine the graph theory and probability theory to give a multivariate statistical modeling. They provide a unified description of uncertainty using probability and complexity using the graphical model. Especially, graphical models provide the following several useful properties: - Graphical models provide a simple and intuitive interpretation of the structures of probabilistic models. On the other hand, they can be used to design and motivate new models. - Graphical models provide additional insights into the properties of the model, including the conditional independence properties. - Complex computations which are required to perform inference and learning in sophisticated models can be expressed in terms of graphical manipulations, in which the underlying mathematical expressions are carried along implicitly. The graphical models have been applied to a large number of fields, including bioinformatics, social science, control theory, image processing, marketing analysis, among others. However, structure learning for graphical models remains an open challenge, since one must cope with a combinatorial search over the space of all possible structures. In this paper, we present a comprehensive survey of the existing structure learning algorithms.Comment: survey on structure learnin

arXiv.org e-Print Archive

Some 0/1 polytopes need exponential size extended formulations

Author: Rothvoß Thomas
Publication venue
Publication date: 29/04/2011
Field of study

We prove that there are 0/1 polytopes P that do not admit a compact LP formulation. More precisely we show that for every n there is a sets X \subseteq {0,1}^n such that conv(X) must have extension complexity at least 2^{n/2 * (1-o(1))}. In other words, every polyhedron Q that can be linearly projected on conv(X) must have exponentially many facets. In fact, the same result also applies if conv(X) is restricted to be a matroid polytope. Conditioning on NP not contained in P_{/poly}, our result rules out the existence of any compact formulation for the TSP polytope, even if the formulation may contain arbitrary real numbers

arXiv.org e-Print Archive

An Asynchronous Distributed Framework for Large-scale Learning Based on Parameter Exchanges

Author: Amini Massih-Reza
Iutzeler Franck
Joshi Bikash
Publication venue
Publication date: 22/05/2017
Field of study

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies. In this paper, we propose an effective asynchronous distributed framework for the minimization of a sum of smooth functions, where each machine performs iterations in parallel on its local function and updates a shared parameter asynchronously. In this way, all machines can continuously work even though they do not have the latest version of the shared parameter. We prove the convergence of the consistency of this general distributed asynchronous method for gradient iterations then show its efficiency on the matrix factorization problem for recommender systems and on binary classification.Comment: 16 page

arXiv.org e-Print Archive

Analyzing the Quantum Annealing Approach for Solving Linear Least Squares Problems

Author: Borle Ajinkya
Lomonaco Samuel J.
Publication venue
Publication date: 01/11/2018
Field of study

With the advent of quantum computers, researchers are exploring if quantum mechanics can be leveraged to solve important problems in ways that may provide advantages not possible with conventional or classical methods. A previous work by O'Malley and Vesselinov in 2016 briefly explored using a quantum annealing machine for solving linear least squares problems for real numbers. They suggested that it is best suited for binary and sparse versions of the problem. In our work, we propose a more compact way to represent variables using two's and one's complement on a quantum annealer. We then do an in-depth theoretical analysis of this approach, showing the conditions for which this method may be able to outperform the traditional classical methods for solving general linear least squares problems. Finally, based on our analysis and observations, we discuss potentially promising areas of further research where quantum annealing can be especially beneficial.Comment: 16 pages, 2 appendice

arXiv.org e-Print Archive

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Author: Alexandridis Georgios
Siolas Georgios
Stafylopatis Andreas
Publication venue
Publication date: 10/01/2018
Field of study

Review-based recommender systems have gained noticeable ground in recent years. In addition to the rating scores, those systems are enriched with textual evaluations of items by the users. Neural language processing models, on the other hand, have already found application in recommender systems, mainly as a means of encoding user preference data, with the actual textual description of items serving only as side information. In this paper, a novel approach to incorporating the aforementioned models into the recommendation process is presented. Initially, a neural language processing model and more specifically the paragraph vector model is used to encode textual user reviews of variable length into feature vectors of fixed length. Subsequently this information is fused along with the rating scores in a probabilistic matrix factorization algorithm, based on maximum a-posteriori estimation. The resulting system, ParVecMF, is compared to a ratings' matrix factorization approach on a reference dataset. The obtained preliminary results on a set of two metrics are encouraging and may stimulate further research in this area

arXiv.org e-Print Archive

The Why and How of Nonnegative Matrix Factorization

Author: Gillis Nicolas
Publication venue
Publication date: 07/03/2014
Field of study

Nonnegative matrix factorization (NMF) has become a widely used tool for the analysis of high-dimensional data as it automatically extracts sparse and meaningful features from a set of nonnegative data vectors. We first illustrate this property of NMF on three applications, in image processing, text mining and hyperspectral imaging --this is the why. Then we address the problem of solving NMF, which is NP-hard in general. We review some standard NMF algorithms, and also present a recent subclass of NMF problems, referred to as near-separable NMF, that can be solved efficiently (that is, in polynomial time), even in the presence of noise --this is the how. Finally, we briefly describe some problems in mathematics and computer science closely related to NMF via the nonnegative rank.Comment: 25 pages, 5 figures. Some typos and errors corrected, Section 3.2 reorganize

arXiv.org e-Print Archive

Learning Topic Models - Going beyond SVD

Author: Arora Sanjeev
Ge Rong
Moitra Ankur
Publication venue
Publication date: 01/01/2012
Field of study

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational works both in machine learning and in theory have suggested a probabilistic model for documents, whereby documents arise as a convex combination of (i.e. distribution on) a small number of topic vectors, each topic vector being a distribution on words (i.e. a vector of word-frequencies). Similar models have since been used in a variety of application areas; the Latent Dirichlet Allocation or LDA model of Blei et al. is especially popular. Theoretical studies of topic modeling focus on learning the model's parameters assuming the data is actually generated from it. Existing approaches for the most part rely on Singular Value Decomposition(SVD), and consequently have one of two limitations: these works need to either assume that each document contains only one topic, or else can only recover the span of the topic vectors instead of the topic vectors themselves. This paper formally justifies Nonnegative Matrix Factorization(NMF) as a main tool in this context, which is an analog of SVD where all vectors are nonnegative. Using this tool we give the first polynomial-time algorithm for learning topic models without the above two limitations. The algorithm uses a fairly mild assumption about the underlying topic matrix called separability, which is usually found to hold in real-life data. A compelling feature of our algorithm is that it generalizes to models that incorporate topic-topic correlations, such as the Correlated Topic Model and the Pachinko Allocation Model. We hope that this paper will motivate further theoretical results that use NMF as a replacement for SVD - just as NMF has come to replace SVD in many applications

arXiv.org e-Print Archive

CiteSeerX

Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing

Author: Gillis Nicolas
Publication venue
Publication date: 01/01/2012
Field of study

Nonnegative matrix factorization (NMF) has become a very popular technique in machine learning because it automatically extracts meaningful features through a sparse and part-based representation. However, NMF has the drawback of being highly ill-posed, that is, there typically exist many different but equivalent factorizations. In this paper, we introduce a completely new way to obtaining more well-posed NMF problems whose solutions are sparser. Our technique is based on the preprocessing of the nonnegative input data matrix, and relies on the theory of M-matrices and the geometric interpretation of NMF. This approach provably leads to optimal and sparse solutions under the separability assumption of Donoho and Stodden (NIPS, 2003), and, for rank-three matrices, makes the number of exact factorizations finite. We illustrate the effectiveness of our technique on several image datasets.Comment: 34 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Approximating Orthogonal Matrices with Effective Givens Factorization

Author: Bruna Joan
Frerix Thomas
Publication venue
Publication date: 14/05/2019
Field of study

We analyze effective approximation of unitary matrices. In our formulation, a unitary matrix is represented as a product of rotations in two-dimensional subspaces, so-called Givens rotations. Instead of the quadratic dimension dependence when applying a dense matrix, applying such an approximation scales with the number factors, each of which can be implemented efficiently. Consequently, in settings where an approximation is once computed and then applied many times, such a representation becomes advantageous. Although effective Givens factorization is not possible for generic unitary operators, we show that minimizing a sparsity-inducing objective with a coordinate descent algorithm on the unitary group yields good factorizations for structured matrices. Canonical applications of such a setup are orthogonal basis transforms. We demonstrate numerical results of approximating the graph Fourier transform, which is the matrix obtained when diagonalizing a graph Laplacian.Comment: International Conference on Machine Learning (ICML 2019

arXiv.org e-Print Archive