176 research outputs found

    The quest for petascale computing

    Full text link

    Algorithmic redistribution methods for block-cyclic decompositions

    Full text link

    An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.

    Get PDF
    With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness

    Parallel computation of echelon forms

    Get PDF
    International audienceWe propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency

    Fast DEM collision checks on multicore nodes.

    Get PDF
    Many particle simulations today rely on spherical or analytical particle shape descriptions. They find non-spherical, triangulated particle models computationally infeasible due to expensive collision detections. We propose a hybrid collision detection algorithm based upon an iterative solve of a minimisation problem that automatically falls back to a brute-force comparison-based algorithm variant if the problem is ill-posed. Such a hybrid can exploit the vector facilities of modern chips and it is well-prepared for the arising manycore era. Our approach pushes the boundary where non-analytical particle shapes and the aligning of more accurate first principle physics become manageable

    ParIC : A Family of Parallel Incomplete Cholesky Preconditioners

    Get PDF
    A class of parallel incomplete factorization preconditionings for the solution of large linear systems is investigated. The approach may be regarded as a generalized domain decomposition method. Adjacent subdomains have to communicate during the setting up of the precon­ ditioner, and during the application of the preconditioner. Overlap is not necessary to achieve high performance. Fill­in levels are considered in a global way. If necessary, the technique may be implemented as a global re­ordering of the unknowns. Experimental results are reported for two­dimensional problems

    Two-sided Grassmann-Rayleigh quotient iteration

    Full text link
    The two-sided Rayleigh quotient iteration proposed by Ostrowski computes a pair of corresponding left-right eigenvectors of a matrix CC. We propose a Grassmannian version of this iteration, i.e., its iterates are pairs of pp-dimensional subspaces instead of one-dimensional subspaces in the classical case. The new iteration generically converges locally cubically to the pairs of left-right pp-dimensional invariant subspaces of CC. Moreover, Grassmannian versions of the Rayleigh quotient iteration are given for the generalized Hermitian eigenproblem, the Hamiltonian eigenproblem and the skew-Hamiltonian eigenproblem.Comment: The text is identical to a manuscript that was submitted for publication on 19 April 200

    On the stability and uniqueness of the flow of a fluid through a porous medium

    Get PDF
    © 2016, The Author(s). In this short note, we study the stability of flows of a fluid through porous media that satisfies a generalization of Brinkman’s equation to include inertial effects. Such flows could have relevance to enhanced oil recovery and also to the flow of dense liquids through porous media. In any event, one cannot ignore the fact that flows through porous media are inherently unsteady, and thus, at least a part of the inertial term needs to be retained in many situations. We study the stability of the rest state and find it to be asymptotically stable. Next, we study the stability of a base flow and find that the flow is asymptotically stable, provided the base flow is sufficiently slow. Finally, we establish results concerning the uniqueness of the flow under appropriate conditions, and present some corresponding numerical results
    corecore