14 research outputs found

    A parallel generalized relaxation method for high-performance image segmentation on GPUs

    Get PDF
    Fast and scalable software modules for image segmentation are needed for modern high-throughput screening platforms in Computational Biology. Indeed, accurate segmentation is one of the main steps to be applied in a basic software pipeline aimed to extract accurate measurements from a large amount of images. Image segmentation is often formulated through a variational principle, where the solution is the minimum of a suitable functional, as in the case of the Ambrosio–Tortorelli model. Euler–Lagrange equations associated with the above model are a system of two coupled elliptic partial differential equations whose finite-difference discretization can be efficiently solved by a generalized relaxation method, such as Jacobi or Gauss–Seidel, corresponding to a first-order alternating minimization scheme. In this work we present a parallel software module for image segmentation based on the Parallel Sparse Basic Linear Algebra Subprograms (PSBLAS), a general-purpose library for parallel sparse matrix computations, using its Graphics Processing Unit (GPU) extensions that allow us to exploit in a simple and transparent way the performance capabilities of both multi-core CPUs and of many-core GPUs. We discuss performance results in terms of execution times and speed-up of the segmentation module running on GPU as well as on multi-core CPUs, in the analysis of 2D gray-scale images of mouse embryonic stem cells colonies coming from biological experiment

    BootCMatch: A software package for bootstrap AMG based on graph weighted matching

    Get PDF
    This article has two main objectives: one is to describe some extensions of an adaptive Algebraic Multigrid (AMG) method of the form previously proposed by the first and third authors, and a second one is to present a new software framework, named BootCMatch, which implements all the components needed to build and apply the described adaptive AMG both as a stand-alone solver and as a preconditioner in a Krylov method. The adaptive AMG presented is meant to handle general symmetric and positive definite (SPD) sparse linear systems, without assuming any a priori information of the problem and its origin; the goal of adaptivity is to achieve a method with a prescribed convergence rate. The presented method exploits a general coarsening process based on aggregation of unknowns, obtained by a maximum weight matching in the adjacency graph of the system matrix. More specifically, a maximum product matching is employed to define an effective smoother subspace (complementary to the coarse space), a process referred to as compatible relaxation, at every level of the recursive two-level hierarchical AMG process. Results on a large variety of test cases and comparisons with related work demonstrate the reliability and efficiency of the method and of the software

    A distributed combustion solver for engine simulations on grids

    Get PDF
    AbstractMulti-dimensional models for predictive simulations of modern engines are an example of multi-physics and multi-scale mathematical models, since lots of thermofluiddynamic processes in complex geometrical configurations have to be considered. Typical models involve different submodels, including turbulence, spray and combustion models, with different characteristic time scales. The predictive capability of the complete models depends on the accuracy of the submodels as well as on the reliability of the numerical solution algorithms. In this work we propose a multi-solver approach for reliable and efficient solution of the stiff Ordinary Differential Equation (ODE) systems arising from detailed chemical reaction mechanisms for combustion modeling. Main aim was to obtain high-performance parallel solution of combustion submodels in the overall procedure for simulation of engines on distributed heterogeneous computing platforms. To this aim we interfaced our solver with the CHEMKIN-II package and the KIVA3V-II code and carried out multi-computer simulations of realistic engines. Numerical experiments devoted to test reliability of the simulation results and efficiency of the distributed combustion solver are presented and discussed

    AMG based on compatible weighted matching for GPUs

    No full text
    We describe main issues and design principles of an efficient implementation, tailored to recent generations of Nvidia Graphics Processing Units (GPUs), of an Algebraic MultiGrid (AMG) preconditioner previously proposed by one of the authors and already available in the open-source package BootCMatch: Bootstrap algebraic multigrid based on Compatible weighted Matching for standard CPUs. The AMG method relies on a new approach for coarsening sparse symmetric positive definite (s.p.d.) matrices, named coarsening based on compatible weighted matching. It exploits maximum weight matching in the adjacency graph of the sparse matrix, driven by the principle of compatible relaxation, providing a suitable aggregation of unknowns which goes beyond the limits of the usual heuristics applied in the current methods. We adopt an approximate solution of the maximum weight matching problem, based on a recently proposed parallel algorithm, referred to as the Suitor algorithm, and show that it allows us to obtain good quality coarse matrices for our AMG on GPUs. We exploit inherent parallelism of modern GPUs in all the kernels involving sparse matrix computations both for the setup of the preconditioner and for its application in a Krylov solver, outperforming preconditioners available in the original sequential CPU code as well as the single node Nvidia AmgX library. Results for a large set of linear systems arising from discretization of scalar and vector partial differential equations (PDEs) are discussed

    2LEV-D2P4: a package of high-performance preconditioners for scientific and engineering applications

    No full text
    International audienceWe present a package of parallel preconditioners which implements one-level and two-level Domain Decomposition algorithms on the top of the PSBLAS library for sparse matrix computations. The package, named 2LEV-D2P4 (Two-LEVel Domain Decomposition Parallel Preconditioners Package based on PSBLAS), currently includes various versions of additive Schwarz preconditioners that are combined with a coarse-level correction to obtain two-level preconditioners. A pure algebraic formulation of the preconditioners is considered. 2LEV-D2P4 has been written in Fortran~95, exploiting features such as abstract data type creation, functional overloading and dynamic memory management, while providing a smooth path towards the integration in legacy application codes. The package, used with Krylov solvers implemented in PSBLAS, has been tested on large-scale linear systems arising from model problems and real applications, showing its effectiveness

    SParC-LES: Enabling large eddy simulations with parallel sparse matrix computation tools

    No full text
    We discuss the design and development of a parallel code for Large Eddy Simulation (LES) by exploiting libraries for sparse matrix computations. We formulate a numerical procedure for the LES of turbulent channel flows, based on an approximate projection method, in terms of linear algebra operators involving sparse matrices and vectors. Then we implement the procedure using general-purpose linear algebra libraries as building blocks. This approach allows to pursue goals such as modularity, accuracy and robustness, as well as easy and fast exploitation of parallelism, with a relatively low coding effort. The parallel LES code developed in this work, named SParC-LES (Sparse Parallel Computation-based LES), exploits two parallel libraries: PSBLAS, providing basic sparse matrix operators and Krylov solvers, and MLD2P4, providing a suite of algebraic multilevel Schwarz preconditioners. Numerical experiments, concerning the simulation by SParC-LES of a turbulent flow in a plane channel, confirm that the LES code can achieve a satisfactory parallel performance. This supports our opinion that the software design methodology used to build SParC-LES yields a very good tradeoff between the exploitation of the computational power of parallel computers and the amount of coding effort
    corecore