Search CORE

776 research outputs found

Sparse Matrix Factorization

Author: Neyshabur Behnam
Panigrahy Rina
Publication venue
Publication date: 13/05/2014
Field of study

We investigate the problem of factorizing a matrix into several sparse matrices and propose an algorithm for this under randomness and sparsity assumptions. This problem can be viewed as a simplification of the deep learning problem where finding a factorization corresponds to finding edges in different layers and values of hidden units. We prove that under certain assumptions for a sparse linear deep network with

n

nodes in each layer, our algorithm is able to recover the structure of the network and values of top layer hidden units for depths up to

\tilde O(n^{1/6})

. We further discuss the relation among sparse matrix factorization, deep learning, sparse recovery and dictionary learning.Comment: 20 page

arXiv.org e-Print Archive

CiteSeerX

On optimal tree traversals for sparse matrix factorization

Author: Jacquelin Mathias
Marchal Loris
Robert Yves
Uçar Bora
Publication venue: HAL CCSD
Publication date: 05/11/2013
Field of study

12 pagesWe study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. Such workflows typically arise in the multifrontal method of sparse matrix factorization. We target a classical two-level memory system, where the main memory is faster but smaller than the secondary memory. A task in the workflow can be processed if all its predecessors have been processed, and if its input and output files fit in the currently available main memory. The amount of available memory at a given time depends upon the ordering in which the tasks are executed. What is the minimum amount of main memory, over all postorder schemes, or over all possible traversals, that is needed for an in-core execution? We establish several complexity results that answer these questions. We propose a new, polynomial time, exact algorithm which runs faster than a reference algorithm. Next, we address the setting where the required memory renders a pure in-core solution unfeasible. In this setting, we ask the following question: what is the minimum amount of I/O that must be performed between the main memory and the secondary memory? We show that this latter problem is NP-hard, and propose efficient heuristics. All algorithms and heuristics are thoroughly evaluated on assembly trees arising in the context of sparse matrix factorizations

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Seismic sparse-spike deconvolution via Toeplitz-sparse matrix factorization

Author: Golub G. H.
Jinghuai Gao
Lingling Wang
Michael Fehler
Porteous I.
Qian Zhao
Tibshirani R.
Xiudi Jiang
Zongben Xu
Publication venue: 'Society of Exploration Geophysicists'
Publication date: 01/10/2015
Field of study

We have developed a new sparse-spike deconvolution (SSD) method based on Toeplitz-sparse matrix factorization (TSMF), a bilinear decomposition of a matrix into the product of a Toeplitz matrix and a sparse matrix, to address the problems of lateral continuity, effects of noise, and wavelet estimation error in SSD. Assuming the convolution model, a constant source wavelet, and the sparse reflectivity, a seismic profile can be considered as a matrix that is the product of a Toeplitz wavelet matrix and a sparse reflectivity matrix. Thus, we have developed an algorithm of TSMF to simultaneously deconvolve the seismic matrix into a wavelet matrix and a reflectivity matrix by alternatively solving two inversion subproblems related to the Toeplitz wavelet matrix and sparse reflectivity matrix, respectively. Because the seismic wavelet is usually compact and smooth, the fused Lasso was used to constrain the elements in the Toeplitz wavelet matrix. Moreover, due to the limitations of computer memory, large seismic data sets were divided into blocks, and the average of the source wavelets deconvolved from these blocks via TSMF-based SSD was used as the final estimation of the source wavelet for all blocks to deconvolve the reflectivity; thus, the lateral continuity of the seismic data can be maintained. The advantages of the proposed deconvolution method include using multiple traces to reduce the effect of random noise, tolerance to errors in the initial wavelet estimation, and the ability to preserve the complex structure of the seismic data without using any lateral constraints. Our tests on the synthetic seismic data from the Marmousi2 model and a section of field seismic data demonstrate that the proposed method can effectively derive the wavelet and reflectivity simultaneously from band-limited data with appropriate lateral coherence, even when the seismic data are contaminated by noise and the initial wavelet estimation is inaccurate

DSpace@MIT

Crossref

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

Author: Agarwal Nitin
Calandriello Daniele
Chaudhuri Kamalika
Cheng Dehua
Fan Rong-En
Levy Omer
Mikolov Tomas
Mikolov Tomas
Stergiou Stergios
Thomas
Tsoumakas Grigorios
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/06/2019
Field of study

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).Comment: 11 pages, in Proceedings of the Web Conference 2019 (WWW 19

arXiv.org e-Print Archive

Crossref

Low-rank and sparse matrix factorization for scientific paper recommendation in heterogeneous network

Author: Cai X
Dai T
Gao T
Pan S
Zhu L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/08/2018
Field of study

© 2013 IEEE. With the rapid growth of scientific publications, it is hard for researchers to acquire appropriate papers that meet their expectations. Recommendation system for scientific articles is an essential technology to overcome this problem. In this paper, we propose a novel low-rank and sparse matrix factorization-based paper recommendation (LSMFPRec) method for authors. The proposed method seamlessly combines low-rank and sparse matrix factorization method with fine-grained paper and author affinity matrixes that are extracted from heterogeneous scientific network. Thus, it can effectively alleviate the sparsity and cold start problems that exist in traditional matrix factorization based collaborative filtering methods. Moreover, LSMFPRec can significantly reduce the error propagated from intermediate outputs. In addition, the proposed method essentially captures the low-rank and sparse characteristics that exist in scientific rating activities; therefore, it can generate more reasonable predicted ratings for influential and uninfluential papers. The effectiveness of the proposed LSMFPRec is demonstrated by the recommendation evaluation conducted on the AAN and CiteULike data sets

OPUS - University of Technology Sydney

Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

Author: Naik Vijay K.
Venugopal Sesh
Publication venue
Publication date
Field of study

A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance

NASA Technical Reports Server