12 research outputs found

    Clock Math — a System for Solving SLEs Exactly

    Get PDF
    In this paper, we present a GPU-accelerated hybrid system that solves ill-conditioned systems of linear equations exactly. Exactly means without rounding errors due to using integer arithmetics. First, we scale floating-point numbers up to integers, then we solve dozens of SLEs within different modular arithmetics and then we assemble sub-solutions back using the Chinese remainder theorem. This approach effectively bypasses current CPU floating-point limitations. The system is capable of solving Hilbert’s matrix without losing a single bit of precision, and with a significant speedup compared to existing CPU solvers

    Parallel Solver of Large Systems of Linear Inequalities Using Fourier-Motzkin Elimination

    Get PDF
    Fourier-Motzkin elimination is a computationally expensive but powerful method to solve a system of linear inequalities. These systems arise e.g. in execution order analysis for loop nests or in integer linear programming. This paper focuses on the analysis, design and implementation of a parallel solver for distributed memory for large systems of linear inequalities using the Fourier-Motzkin elimination algorithm. We also measure the speedup of parallel solver and prove that this implementation results in good scalability

    Block Iterators for Sparse Matrices

    Full text link

    A New Format for the Sparse Matrix-vector Multiplication

    No full text
    Algorithms for the sparse matrix-vector multiplication (shortly SpMV) are important building blocks in solvers of sparse systems of linear equations. Due to matrix sparsity, the memory access patterns are irregular and the utilization of a cache suffers from low spatial and temporal locality. To reduce this effect, the register blocking formats were designed. This paper introduces a new combined format, for storing sparse matrices that extends possibilities of the diagonal register blocking format

    A New Approach for Accelerating the Sparse Matrixvector Multiplication

    No full text
    Sparse matrix-vector multiplication (shortly SpMV) is one of most common subroutines in the numerical linear algebra. The problem is that the memory access patterns during the SpMV are irregular and the utilization of cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpMV are based on matrix reordering and register blocking. These matrix transformations are designed to handle randomly occurring dense blocks in a sparse matrix. The efficiency of these transformations depends strongly on the presence of suitable blocks. The overhead of a reorganization of a matrix from one format to another one is often of the order of tens of executions of a SpMV. For that reason, such a reorganization pays off only if the same matrix A is multiplied with multiple different vectors, e.g., in iterative linear solvers. This paper introduces new approach for the acceleration the SpMV. This approach consists of 3 steps: 1) dividing matrix A into non-empty regions, 2) choosing an efficient way to traverse these regions (in another words choosing an efficient ordering of partial multiplications), 3) choosing the optimal type of storage for each region. All these 3 steps are tightly coupled. The first step divides the whole matrix into smaller parts (regions) those can fit in the cache. The second step improve

    Acceleration of Le Bail fitting method on parallel platforms

    Get PDF
    summary:Le Bail fitting method is procedure used in the applied crystallography mainly during the crystal structure determination. As in many other applications, there is a need for a great performance and short execution time. In this paper, we describe utilization of parallel computing for mathematical operations used in Le Bail fitting. We present an algorithm implementing this method with highlighted possible approaches to its aforementioned parallelization. Then, we propose a sample parallel version using the OpenMP API and its performance results on the real multithreaded system. Further potential for the massive parallelization is also discussed

    Efficient parallel evaluation of block properties of sparse matrices

    No full text

    A new diagonal blocking format and model of cache behavior for sparse matrices

    No full text
    Algorithms for the sparse matrix-vector multiplication (shortly SpMxV) are important building blocks in solvers of sparse systems of linear equations. Due to matrix sparsity, the memory access patterns are irregular and the utilization of a cache suffers from low spatial and\ud temporal locality. To reduce this effect, the diagonal register blocking format was designed. This paper introduces a new combined format, called\ud CARB, for storing sparse matrices that extends possibilities of the diagonal register blocking format.\ud \ud We have also developed a probabilistic model for estimating the numbers of cache misses during the SpMxV in the CARB format. Using HW cache monitoring tools, we compare the predicted numbers of cache misses with real numbers on Intel x86 architecture with L1 and L2 caches. The average accuracy of our analytical model is around 95% in case of\ud L2 cache and 88% in case of L1 cache

    Parallelization of artificial immune systems using a massive parallel approach via modern GPUs

    Get PDF
    summary:Parallelization is one of possible approaches for obtaining better results in terms of algorithm performance and overcome the limits of the sequential computation. In this paper, we present a study of parallelization of the opt-aiNet algorithm which comes from Artificial Immune Systems, one part of large family of population based algorithms inspired by nature. The opt-aiNet algorithm is based on an immune network theory which incorporates knowledge about mammalian immune systems in order to create a state-of-the-art algorithm suitable for the multimodal function optimization. The algorithm is known for a combination of local and global search with an emphasis on maintaining a stable set of distinct local extrema solutions. Moreover, its modifications can be used for many other purposes like data clustering or combinatorial optimization. The parallel version of the algorithm is designed especially for modern graphics processing units. The preliminary performance results show very significant speedup over the computation with traditional central processor units
    corecore