1,694 research outputs found

    Performance Improvements of Common Sparse Numerical Linear Algebra Computations

    Get PDF
    Manufacturers of computer hardware are able to continuously sustain an unprecedented pace of progress in computing speed of their products, partially due to increased clock rates but also because of ever more complicated chip designs. With new processor families appearing every few years, it is increasingly harder to achieve high performance rates in sparse matrix computations. This research proposes new methods for sparse matrix factorizations and applies in an iterative code generalizations of known concepts from related disciplines. The proposed solutions and extensions are implemented in ways that tend to deliver efficiency while retaining ease of use of existing solutions. The implementations are thoroughly timed and analyzed using a commonly accepted set of test matrices. The tests were conducted on modern processors that seem to have gained an appreciable level of popularity and are fairly representative for a wider range of processor types that are available on the market now or in the near future. The new factorization technique formally introduced in the early chapters is later on proven to be quite competitive with state of the art software currently available. Although not totally superior in all cases (as probably no single approach could possibly be), the new factorization algorithm exhibits a few promising features. In addition, an all-embracing optimization effort is applied to an iterative algorithm that stands out for its robustness. This also gives satisfactory results on the tested computing platforms in terms of performance improvement. The same set of test matrices is used to enable an easy comparison between both investigated techniques, even though they are customarily treated separately in the literature. Possible extensions of the presented work are discussed. They range from easily conceivable merging with existing solutions to rather more evolved schemes dependent on hard to predict progress in theoretical and algorithmic research

    Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

    Topology-aware optimization of big sparse matrices and matrix multiplications on main-memory systems

    Get PDF
    Since data sizes of analytical applications are continuously growing, many data scientists are switching from customized micro-solutions to scalable alternatives, such as statistical and scientific databases. However, many algorithms in data mining and science are expressed in terms of linear algebra, which is barely supported by major database vendors and big data solutions. On the other side, conventional linear algebra algorithms and legacy matrix representations are often not suitable for very large matrices. We propose a strategy for large matrix processing on modern multicore systems that is based on a novel, adaptive tile matrix representation (AT MATRIX). Our solution utilizes multiple techniques inspired from database technology, such as multidimensional data partitioning, cardinality estimation, indexing, dynamic rewrites, and many more in order to optimize the execution time. Based thereon we present a matrix multiplication operator ATMULT, which outperforms alternative approaches. The aim of our solution is to overcome the burden for data scientists of selecting appropriate algorithms and matrix storage representations. We evaluated AT MATRIX together with ATMULT on several real-world and synthetic random matrices

    An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model

    Full text link
    • …
    corecore