47,260 research outputs found

    Distributed Large-Scale Tensor Decomposition

    Get PDF
    International audienceCanonical Polyadic Decomposition (CPD), also known as PARAFAC, is a useful tool for tensor factorization. It has found application in several domains including signal processing and data mining. With the deluge of data faced in our societies, large-scale matrix and tensor factorizations become a crucial issue. Few works have been devoted to large-scale tensor factorizations. In this paper, we introduce a fully distributed method to compute the CPD of a large-scale data tensor across a network of machines with limited computation resources. The proposed approach is based on collaboration between the machines in the network across the three modes of the data tensor. Such a multi-modal collaboration allows an essentially unique reconstruction of the factor matrices in an efficient way. We provide an analysis of the computation and communication cost of the proposed scheme and address the problem of minimizing communication costs while maximizing the use of available computation resources

    Tensor Networks for Big Data Analytics and Large-Scale Optimization Problems

    Full text link
    In this paper we review basic and emerging models and associated algorithms for large-scale tensor networks, especially Tensor Train (TT) decompositions using novel mathematical and graphical representations. We discus the concept of tensorization (i.e., creating very high-order tensors from lower-order original data) and super compression of data achieved via quantized tensor train (QTT) networks. The purpose of a tensorization and quantization is to achieve, via low-rank tensor approximations "super" compression, and meaningful, compact representation of structured data. The main objective of this paper is to show how tensor networks can be used to solve a wide class of big data optimization problems (that are far from tractable by classical numerical methods) by applying tensorization and performing all operations using relatively small size matrices and tensors and applying iteratively optimized and approximative tensor contractions. Keywords: Tensor networks, tensor train (TT) decompositions, matrix product states (MPS), matrix product operators (MPO), basic tensor operations, tensorization, distributed representation od data optimization problems for very large-scale problems: generalized eigenvalue decomposition (GEVD), PCA/SVD, canonical correlation analysis (CCA).Comment: arXiv admin note: text overlap with arXiv:1403.204

    Distributed Computation of Tensor Decompositions in Collaborative Networks

    Get PDF
    International audienceIn this paper, we consider the issue of distributed computation of tensor decompositions. A central unit observing a global data tensor assigns different data sub-tensors to several computing nodes grouped into clusters. The goal is to distribute the computation of a tensor decomposition across the different computing nodes of the network, which is particularly useful when dealing with large-scale data tensors. However, this is only possible when the data sub-tensors assigned to each computing node in a cluster satisfies minimum conditions for uniqueness. By allowing collaboration between computing nodes in a cluster, we show that average consensus based estimation is useful to yield unique estimates of the factor matrices of each data sub-tensor. Moreover, an essentially unique reconstruction of the global factor matrices at the central unit is possible by allowing the subtensors assigned to different clusters to overlap in one mode. The proposed approach may be useful to a number of distributed tensor-based estimation problems in signal processing

    Distributed Computation of Tensor Decompositions in Collaborative Networks

    No full text
    International audienceIn this paper, we consider the issue of distributed computation of tensor decompositions. A central unit observing a global data tensor assigns different data sub-tensors to several computing nodes grouped into clusters. The goal is to distribute the computation of a tensor decomposition across the different computing nodes of the network, which is particularly useful when dealing with large-scale data tensors. However, this is only possible when the data sub-tensors assigned to each computing node in a cluster satisfies minimum conditions for uniqueness. By allowing collaboration between computing nodes in a cluster, we show that average consensus based estimation is useful to yield unique estimates of the factor matrices of each data sub-tensor. Moreover, an essentially unique reconstruction of the global factor matrices at the central unit is possible by allowing the subtensors assigned to different clusters to overlap in one mode. The proposed approach may be useful to a number of distributed tensor-based estimation problems in signal processing

    A Unified Optimization Approach for Sparse Tensor Operations on GPUs

    Full text link
    Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations make such implementations challenging. We leverage the fact that sparse tensor operations share similar computation patterns to propose a unified tensor representation called F-COO. Combined with GPU-specific optimizations, F-COO provides highly-optimized implementations of sparse tensor computations on GPUs. The performance of the proposed unified approach is demonstrated for tensor-based kernels such as the Sparse Matricized Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to 3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs
    • …
    corecore