168 research outputs found

    High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors

    Get PDF
    International audience—We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general N-dimensional sparse tensors. The targeted algorithms are iterative and use the alternating least squares method. At each iteration, for each dimension of an N-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with (N − 1) matrices (TTMc step); (ii) the product is then converted to a matrix; and (iii) a few leading left singular vectors of the resulting matrix are computed (TRSVD step) to update one of the matrices for the next TTMc step. We propose an efficient parallelization of these algorithms for the current parallel platforms with multicore nodes. We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps. We propose a coarse and a fine-grain parallel algorithm in a distributed memory environment, investigate data dependencies, and identify efficient communication schemes. We demonstrate how the computation of singular vectors in the TRSVD step can be carried out efficiently following the TTMc step. Finally, we develop a hybrid MPI-OpenMP implementation of the overall algorithm and report scalability results on up to 4096 cores on 256 nodes of an IBM BlueGene/Q supercomputer

    A medium-grain method for fast 2D bipartitioning of sparse matrices

    Get PDF
    We present a new hypergraph-based method, the medium-grain method, for solving the sparse matrix partitioning problem. This problem arises when distributing data for parallel sparse matrix-vector multiplication. In the medium-grain method, each matrix nonzero is assigned to either a row group or a column group, and these groups are represented by vertices of the hypergraph. For an m x n sparse matrix, the resulting hypergraph has m + n vertices and m + n hyperedges. Furthermore, we present an iterative refinement procedure for improvement of a given partitioning, based on the medium-grain method, which can be applied as a cheap but effective postprocessing step after any partitioning method. The medium-grain method is able to produce fully two-dimensional bipartitionings, but its computational complexity equals that of one-dimensional methods. Experimental results for a large set of sparse test matrices show that the medium-grain method with iterative refinement produces bipartitionings with lower communication volume compared to current state-of-the-art methods, and is faster at producing them

    Parallel image restoration

    Get PDF
    Cataloged from PDF version of article.In this thesis, we are concerned with the image restoration problem which has been formulated in the literature as a system of linear inequalities. With this formulation, the resulting constraint matrix is an unstructured sparse-matrix and even with small size images we end up with huge matrices. So, to solve the restoration problem, we have used the surrogate constraint methods, that can work efficiently for large size problems and are amenable for parallel implementations. Among the surrogate constraint methods, the basic method considers all of the violated constraints in the system and performs a single block projection in each step. On the other hand, parallel method considers a subset of the constraints, and makes simultaneous block projections. Using several partitioning strategies and adopting different communication models we have realized several parallel implementations of the two methods. We have used the hypergraph partitioning based decomposition methods in order to minimize the communication costs while ensuring load balance among the processors. The implementations are evaluated based on the per iteration performance and on the overall performance. Besides, the effects of different partitioning strategies on the speed of convergence are investigated. The experimental results reveal that the proposed parallelization schemes have practical usage in the restoration problem and in many other real-world applications which can be modeled as a system of linear inequalities.Malas, TahirM.S

    Web-site-based partitioning techniques for efficient parallelization of the PageRank computation

    Get PDF
    Cataloged from PDF version of article.Web search engines use ranking techniques to order Web pages in query results. PageRank is an important technique, which orders Web pages according to the linkage structure of the Web. The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enormous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. However, efficiently parallelizing PageRank is not an easy task, because of the irregular sparsity pattern of the Web matrix. Graph and hypergraphpartitioning-based techniques are widely used for efficiently parallelizing matrixvector multiplications. Recently, a hypergraph-partitioning-based decomposition technique for fast parallel computation of PageRank is proposed. This technique aims to minimize the communication overhead of the parallel matrix-vector multiplication. However, the proposed technique has a high prepropocessing time, which makes the technique impractical. In this work, we propose 1D (rowwise and columnwise) and 2D (fine-grain and checkerboard) decomposition models using web-site-based graph and hypergraph-partitioning techniques. Proposed models minimize the communication overhead of the parallel PageRank computations with a reasonable preprocessing time. The models encapsulate not only the matrix-vector multiplication, but the overall iterative algorithm. Conducted experiments show that the proposed models achieve fast PageRank computation with low preprocessing time, compared with those in the literature.Cevahir, AliM.S

    Open Problems in (Hyper)Graph Decomposition

    Full text link
    Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for complexity reduction or parallelization. This report is a summary of discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph Decomposition" and presents currently open problems and future directions in the area of (hyper)graph decomposition

    An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model

    Full text link

    Parpatoh : A 2D-parallel hypergraph partitioning tool

    Get PDF
    Cataloged from PDF version of article.Hypergraph partitioning is a process that is being used to find solutions for optimization problems in various areas, including parallel volume rendering, parallel information retrieval and VLSI circuit design. While the current partitioning methods are adequate for hypergraphs up to certain size, these methods start to fail once the problem size exceeds this threshold. In this thesis we introduce ParPaToH, a parallel p-way hypergraph partitioning tool that makes use of a 2-D decomposition to reduce the communication overhead and implements a parallel-computing friendly version of the accepted multi-level partitioning paradigm to generate its partitioning. We present new concepts in hypergraph partitioning that lead to a coarse-grain parallel solution. Finally, we discuss the implementation of the tool in detail and present experimental results to demonstrate its effectiveness.Karaca, EvrenM.S
    corecore