177 research outputs found
An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.
With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness
Parallel computation of echelon forms
International audienceWe propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as possible these reductions while mixing fine-grain parallelizations of tiled iterative and recursive algorithms. Second, fast linear algebra variants, e.g., using Strassen-Winograd algorithm, never suffer from instability and can thus be widely used in cascade with the classical algorithms. There, trade-offs are to be made between size of blocks well suited to those fast variants or to load and communication balancing. Third, many applications over finite fields require the rank profile of the matrix (quite often rank deficient) rather than the solution to a linear system. It is thus important to design parallel algorithms that preserve and compute this rank profile. Moreover, as the rank profile is only discovered during the algorithm, block size has then to be dynamic. We propose and compare several block decomposition: tile iterative with left-looking, right-looking and Crout variants, slab and tile recursive. Experiments demonstrate that the tile recursive variant performs better and matches the performance of reference numerical software when no rank deficiency occur. Furthermore, even in the most heterogeneous case, namely when all pivot blocks are rank deficient, we show that it is possbile to maintain a high efficiency
Fast DEM collision checks on multicore nodes.
Many particle simulations today rely on spherical or analytical particle shape descriptions. They find non-spherical, triangulated particle models computationally infeasible due to expensive collision detections. We propose a hybrid collision detection algorithm based upon an iterative solve of a minimisation problem that automatically falls back to a brute-force comparison-based algorithm variant if the problem is ill-posed. Such a hybrid can exploit the vector facilities of modern chips and it is well-prepared for the arising manycore era. Our approach pushes the boundary where non-analytical particle shapes and the aligning of more accurate first principle physics become manageable
ParIC : A Family of Parallel Incomplete Cholesky Preconditioners
A class of parallel incomplete factorization preconditionings
for the solution of large linear systems is investigated. The approach may
be regarded as a generalized domain decomposition method. Adjacent
subdomains have to communicate during the setting up of the precon
ditioner, and during the application of the preconditioner. Overlap is
not necessary to achieve high performance. Fillin levels are considered
in a global way. If necessary, the technique may be implemented as a
global reordering of the unknowns. Experimental results are reported
for twodimensional problems
Two-sided Grassmann-Rayleigh quotient iteration
The two-sided Rayleigh quotient iteration proposed by Ostrowski computes a
pair of corresponding left-right eigenvectors of a matrix . We propose a
Grassmannian version of this iteration, i.e., its iterates are pairs of
-dimensional subspaces instead of one-dimensional subspaces in the classical
case. The new iteration generically converges locally cubically to the pairs of
left-right -dimensional invariant subspaces of . Moreover, Grassmannian
versions of the Rayleigh quotient iteration are given for the generalized
Hermitian eigenproblem, the Hamiltonian eigenproblem and the skew-Hamiltonian
eigenproblem.Comment: The text is identical to a manuscript that was submitted for
publication on 19 April 200
On the stability and uniqueness of the flow of a fluid through a porous medium
© 2016, The Author(s). In this short note, we study the stability of flows of a fluid through porous media that satisfies a generalization of Brinkman’s equation to include inertial effects. Such flows could have relevance to enhanced oil recovery and also to the flow of dense liquids through porous media. In any event, one cannot ignore the fact that flows through porous media are inherently unsteady, and thus, at least a part of the inertial term needs to be retained in many situations. We study the stability of the rest state and find it to be asymptotically stable. Next, we study the stability of a base flow and find that the flow is asymptotically stable, provided the base flow is sufficiently slow. Finally, we establish results concerning the uniqueness of the flow under appropriate conditions, and present some corresponding numerical results
- …