776 research outputs found

    Sparse Matrix Factorization

    Full text link
    We investigate the problem of factorizing a matrix into several sparse matrices and propose an algorithm for this under randomness and sparsity assumptions. This problem can be viewed as a simplification of the deep learning problem where finding a factorization corresponds to finding edges in different layers and values of hidden units. We prove that under certain assumptions for a sparse linear deep network with nn nodes in each layer, our algorithm is able to recover the structure of the network and values of top layer hidden units for depths up to O~(n1/6)\tilde O(n^{1/6}). We further discuss the relation among sparse matrix factorization, deep learning, sparse recovery and dictionary learning.Comment: 20 page

    On optimal tree traversals for sparse matrix factorization

    Get PDF
    12 pagesWe study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. Such workflows typically arise in the multifrontal method of sparse matrix factorization. We target a classical two-level memory system, where the main memory is faster but smaller than the secondary memory. A task in the workflow can be processed if all its predecessors have been processed, and if its input and output files fit in the currently available main memory. The amount of available memory at a given time depends upon the ordering in which the tasks are executed. What is the minimum amount of main memory, over all postorder schemes, or over all possible traversals, that is needed for an in-core execution? We establish several complexity results that answer these questions. We propose a new, polynomial time, exact algorithm which runs faster than a reference algorithm. Next, we address the setting where the required memory renders a pure in-core solution unfeasible. In this setting, we ask the following question: what is the minimum amount of I/O that must be performed between the main memory and the secondary memory? We show that this latter problem is NP-hard, and propose efficient heuristics. All algorithms and heuristics are thoroughly evaluated on assembly trees arising in the context of sparse matrix factorizations

    Seismic sparse-spike deconvolution via Toeplitz-sparse matrix factorization

    Get PDF
    We have developed a new sparse-spike deconvolution (SSD) method based on Toeplitz-sparse matrix factorization (TSMF), a bilinear decomposition of a matrix into the product of a Toeplitz matrix and a sparse matrix, to address the problems of lateral continuity, effects of noise, and wavelet estimation error in SSD. Assuming the convolution model, a constant source wavelet, and the sparse reflectivity, a seismic profile can be considered as a matrix that is the product of a Toeplitz wavelet matrix and a sparse reflectivity matrix. Thus, we have developed an algorithm of TSMF to simultaneously deconvolve the seismic matrix into a wavelet matrix and a reflectivity matrix by alternatively solving two inversion subproblems related to the Toeplitz wavelet matrix and sparse reflectivity matrix, respectively. Because the seismic wavelet is usually compact and smooth, the fused Lasso was used to constrain the elements in the Toeplitz wavelet matrix. Moreover, due to the limitations of computer memory, large seismic data sets were divided into blocks, and the average of the source wavelets deconvolved from these blocks via TSMF-based SSD was used as the final estimation of the source wavelet for all blocks to deconvolve the reflectivity; thus, the lateral continuity of the seismic data can be maintained. The advantages of the proposed deconvolution method include using multiple traces to reduce the effect of random noise, tolerance to errors in the initial wavelet estimation, and the ability to preserve the complex structure of the seismic data without using any lateral constraints. Our tests on the synthetic seismic data from the Marmousi2 model and a section of field seismic data demonstrate that the proposed method can effectively derive the wavelet and reflectivity simultaneously from band-limited data with appropriate lateral coherence, even when the seismic data are contaminated by noise and the initial wavelet estimation is inaccurate

    NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

    Full text link
    We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).Comment: 11 pages, in Proceedings of the Web Conference 2019 (WWW 19

    Low-rank and sparse matrix factorization for scientific paper recommendation in heterogeneous network

    Full text link
    © 2013 IEEE. With the rapid growth of scientific publications, it is hard for researchers to acquire appropriate papers that meet their expectations. Recommendation system for scientific articles is an essential technology to overcome this problem. In this paper, we propose a novel low-rank and sparse matrix factorization-based paper recommendation (LSMFPRec) method for authors. The proposed method seamlessly combines low-rank and sparse matrix factorization method with fine-grained paper and author affinity matrixes that are extracted from heterogeneous scientific network. Thus, it can effectively alleviate the sparsity and cold start problems that exist in traditional matrix factorization based collaborative filtering methods. Moreover, LSMFPRec can significantly reduce the error propagated from intermediate outputs. In addition, the proposed method essentially captures the low-rank and sparse characteristics that exist in scientific rating activities; therefore, it can generate more reasonable predicted ratings for influential and uninfluential papers. The effectiveness of the proposed LSMFPRec is demonstrated by the recommendation evaluation conducted on the AAN and CiteULike data sets

    Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

    Get PDF
    A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance
    • …
    corecore