216 research outputs found

    Scalable and interpretable product recommendations via overlapping co-clustering

    Full text link
    We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters of clients and products, based only on positive or implicit feedback. Our approach is applicable on very large datasets because it exhibits almost linear complexity in the input examples and the number of co-clusters. We show, both on real industrial data and on publicly available datasets, that the recommendation accuracy of our algorithm is competitive to that of state-of-art matrix factorization techniques. In addition, our technique has the advantage of offering recommendations that are textually and visually interpretable. Finally, we examine how to implement our technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201

    Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization

    Full text link
    This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our approach guarantees the stationary point being used in deriving the equivalence is located on the feasible region in the nonnegative orthant. Additionally, since clustering capability of a matrix decomposition technique can sometimes imply its latent semantic indexing (LSI) aspect, we will also evaluate LSI aspect of the NMF by showing its capability in solving the synonymy and polysemy problems in synthetic datasets. And more extensive evaluation will be conducted by comparing LSI performances of the NMF and the singular value decomposition (SVD), the standard LSI method, using some standard datasets.Comment: 28 pages, 5 figure

    Convergence of block coordinate descent with diminishing radius for nonconvex optimization

    Full text link
    Block coordinate descent (BCD), also known as nonlinear Gauss-Seidel, is a simple iterative algorithm for nonconvex optimization that sequentially minimizes the objective function in each block coordinate while the other coordinates are held fixed. We propose a version of BCD that is guaranteed to converge to the stationary points of block-wise convex and differentiable objective functions under constraints. Furthermore, we obtain a best-case rate of convergence of order logn/n\log n/\sqrt{n}, where nn denotes the number of iterations. A key idea is to restrict the parameter search within a diminishing radius to promote stability of iterates, and then to show that such auxiliary constraints vanish in the limit. As an application, we provide a modified alternating least squares algorithm for nonnegative CP tensor factorization that converges to the stationary points of the reconstruction error with the same bound on the best-case rate of convergence. We also experimentally validate our results with both synthetic and real-world data.Comment: 12 pages, 2 figure. Rate of convergence added. arXiv admin note: text overlap with arXiv:2009.0761

    Nonnegative factorization and the maximum edge biclique problem

    Get PDF
    Nonnegative matrix factorization (NMF) is a data analysis technique based on the approximation of a nonnegative matrix with a product of two nonnegative factors, which allows compression and interpretation of nonnegative data. In this paper, we study the case of rank-one factorization and show that when the matrix to be factored is not required to be nonnegative, the corresponding problem (R1NF) becomes NP-hard. This sheds new light on the complexity of NMF since any algorithm for fixed-rank NMF must be able to solve at least implicitly such rank-one subproblems. Our proof relies on a reduction of the maximum edge biclique problem to R1NF. We also link stationary points of R1NF to feasible solutions of the biclique problem, which allows us to design a new type of biclique finding algorithm based on the application of a block-coordinate descent scheme to R1NF. We show that this algorithm, whose algorithmic complexity per iteration is proportional to the number of edges in the graph, is guaranteed to converge to a biclique and that it performs competitively with existing methods on random graphs and text mining datasets.nonnegative matrix factorization, rank-one factorization, maximum edge biclique problem, algorithmic complexity, biclique finding algorithm

    Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent

    Get PDF
    Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful

    Block preconditioners for saddle point linear systems arising in the FE discretization of the Navier-Stokes equations. Application to the driven cavity problem

    Get PDF
    In this thesis, we study a Constraint Preconditioner for the saddle point linear system that arises in the Finite Element discretization of the Navier-Stokes equations, using P2-P1 elements and Picard linearization. The system is solved using the GMRES method. Our focus is in finding scalable preconditioners for the (1,1)-block and for the Schur complement: we use, respectively, a Multigrid scheme with adequate flow-following smoothing and an advanced version of the BFBt preconditioner. We present results involving the lid-driven cavity problem for viscosities up to 0.001

    Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem

    Get PDF
    In this paper, we consider the symmetric multi-type non-negative matrix tri-factorization problem (SNMTF), which attempts to factorize several symmetric non-negative matrices simultaneously. This can be considered as a generalization of the classical non-negative matrix tri-factorization problem and includes a non-convex objective function which is a multivariate sixth degree polynomial and a has convex feasibility set. It has a special importance in data science, since it serves as a mathematical model for the fusion of different data sources in data clustering. We develop four methods to solve the SNMTF. They are based on four theoretical approaches known from the literature: the fixed point method (FPM), the block-coordinate descent with projected gradient (BCD), the gradient method with exact line search (GM-ELS) and the adaptive moment estimation method (ADAM). For each of these methods we offer a software implementation: for the former two methods we use Matlab and for the latter Python with the TensorFlow library. We test these methods on three data-sets: the synthetic data-set we generated, while the others represent real-life similarities between different objects. Extensive numerical results show that with sufficient computing time all four methods perform satisfactorily and ADAM most often yields the best mean square error (MSE\mathrm{MSE}). However, if the computation time is limited, FPM gives the best MSE\mathrm{MSE} because it shows the fastest convergence at the beginning. All data-sets and codes are publicly available on our GitLab profile
    corecore