Search CORE

126 research outputs found

Sparse approximate inverse preconditioners on high performance GPU platforms

Author: Bertaccini Daniele
Filippone Salvatore
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Simulation with models based on partial differential equations often requires the solution of (sequences of) large and sparse algebraic linear systems. In multidimensional domains, preconditioned Krylov iterative solvers are often appropriate for these duties. Therefore, the search for efficient preconditioners for Krylov subspace methods is a crucial theme. Recent developments, especially in computing hardware, have renewed the interest in approximate inverse preconditioners in factorized form, because their application during the solution process can be more efficient. We present here some experiences focused on the approximate inverse preconditioners proposed by Benzi and Tůma from 1996 and the sparsification and inversion proposed by van Duin in 1999. Computational costs, reorderings and implementation issues are considered both on conventional and innovative computing architectures like Graphics Programming Units (GPUs)

Crossref

Cranfield CERES

ART

Sparse matrix-vector multiplication on GPGPUs

Author: Filippone Salvatore
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2017
Field of study

The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this paper we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and trade-offs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains

Cranfield CERES

A parallel generalized relaxation method for high-performance image segmentation on GPUs

Author: D’Ambra Pasqua
Filippone Salvatore
Publication venue: 'Elsevier BV'
Publication date: 01/05/2015
Field of study

Fast and scalable software modules for image segmentation are needed for modern high-throughput screening platforms in Computational Biology. Indeed, accurate segmentation is one of the main steps to be applied in a basic software pipeline aimed to extract accurate measurements from a large amount of images. Image segmentation is often formulated through a variational principle, where the solution is the minimum of a suitable functional, as in the case of the Ambrosio–Tortorelli model. Euler–Lagrange equations associated with the above model are a system of two coupled elliptic partial differential equations whose finite-difference discretization can be efficiently solved by a generalized relaxation method, such as Jacobi or Gauss–Seidel, corresponding to a first-order alternating minimization scheme. In this work we present a parallel software module for image segmentation based on the Parallel Sparse Basic Linear Algebra Subprograms (PSBLAS), a general-purpose library for parallel sparse matrix computations, using its Graphics Processing Unit (GPU) extensions that allow us to exploit in a simple and transparent way the performance capabilities of both multi-core CPUs and of many-core GPUs. We discuss performance results in terms of execution times and speed-up of the segmentation module running on GPU as well as on multi-core CPUs, in the analysis of 2D gray-scale images of mouse embryonic stem cells colonies coming from biological experiment

Crossref

Cranfield CERES

ART

Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures

Author: Cardellini Valeria
Fanfarillo Alessandro
Filippone Salvatore
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

In order to reach challenging performance goals, computer architecture is expected to change significantly in the near future. Heterogeneous chips, equipped with different types of cores and memory, will force application developers to deal with irregular communication patterns, high levels of parallelism, and unexpected behavior. Load balancing among the heterogeneous compute units will be a critical task in order to achieve an effective usage of the computational power provided by such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications, in particular because of PGAS one-sided semantic and ease of programmability. In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian option pricing problem, running in symmetric mode on Intel Xeon Phi Knights Corner and Knights Landing architectures

Crossref

Cranfield CERES

ART

Spin Signatures of Photogenerated Radical Anions in Polymer-[70]Fullerene Bulk Heterojunctions: High Frequency Pulsed EPR Spectroscopy

Author: Deibel Carsten
Dyakonov Vladimir
Filippone Salvatore
Martin Nazario
Poluektov Oleg G.
Sperlich Andreas
Publication venue: 'American Chemical Society (ACS)'
Publication date: 07/10/2011
Field of study

Charged polarons in thin films of polymer-fullerene composites are investigated by light-induced electron paramagnetic resonance (EPR) at 9.5 GHz (X-band) and 130 GHz (D-band). The materials studied were poly(3-hexylthiophene) (PHT), [6,6]-phenyl-C61-butyric acid methyl ester (C60-PCBM), and two different soluble C70-derivates: C70-PCBM and diphenylmethano[70]fullerene oligoether (C70-DPM-OE). The first experimental identification of the negative polaron localized on the C70-cage in polymer-fullerene bulk heterojunctions has been obtained. When recorded at conventional X-band EPR, this signal is overlapping with the signal of the positive polaron, which does not allow for its direct experimental identification. Owing to the superior spectral resolution of the high frequency D-band EPR, we were able to separate light-induced signals from P+ and P- in PHT-C70 bulk heterojunctions. Comparing signals from C70-derivatives with different side-chains, we have obtained experimental proof that the polaron is localized on the cage of the C70 molecule

arXiv.org e-Print Archive

Crossref

BootCMatch: A software package for bootstrap AMG based on graph weighted matching

Author: D’Ambra Pasqua
Filippone Salvatore
Vassilevski Panayot S.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/01/2018
Field of study

This article has two main objectives: one is to describe some extensions of an adaptive Algebraic Multigrid (AMG) method of the form previously proposed by the first and third authors, and a second one is to present a new software framework, named BootCMatch, which implements all the components needed to build and apply the described adaptive AMG both as a stand-alone solver and as a preconditioner in a Krylov method. The adaptive AMG presented is meant to handle general symmetric and positive definite (SPD) sparse linear systems, without assuming any a priori information of the problem and its origin; the goal of adaptivity is to achieve a method with a prescribed convergence rate. The presented method exploits a general coarsening process based on aggregation of unknowns, obtained by a maximum weight matching in the adjacency graph of the system matrix. More specifically, a maximum product matching is employed to define an effective smoother subspace (complementary to the coarse space), a process referred to as compatible relaxation, at every level of the recursive two-level hierarchical AMG process. Results on a large variety of test cases and comparisons with related work demonstrate the reliability and efficiency of the method and of the software

Crossref

ZENODO

Cranfield CERES

PDXScholar (Portland State University)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A framework for unit testing with coarray Fortran

Author: Abdullahi Hassan Ambra
Cardellini Valeria
Filippone Salvatore
Publication venue: The Society for Modeling and Simulation International
Publication date: 26/04/2017
Field of study

Parallelism is a ubiquitous feature of modern computing architectures; indeed, we might even say that serial code is now automatically legacy code. Writing parallel code poses significant challenges to programs, and is often error-prone. Partitioned Global Address Space (PGAS) languages, such as Coarray Fortran (CAF), represent a promising development direction in the quest for a trade-off between simplicity and performance. CAF is a parallel programming model that allows a smooth migration from serial to parallel code. However, despite CAF simplicity, refactoring serial code and migrating it to parallel versions is still error-prone, especially in complex softwares. The combination of unit testing, which drastically reduces defect injection, and CAF is therefore a very appealing prospect; however, it requires appropriate tools to realize its potential. In this paper, we present the first CAF-compatible framework for unit tests, developed as an extension to the Parallel Fortran Unit Test framework (pFUnit)

Cranfield CERES

ART

Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML

Author: Carver Jeffrey
Filippone Salvatore
Morris Karla
Nanthaamornphong Aziz
Publication venue
Publication date: 01/01/2015
Field of study

Many scientists who implement computational science and engineering software have adopted the object-oriented (OO) Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML) class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations

Crossref

Directory of Open Access Journals

ART

Open Access Repository

Why diffusion-based preconditioning of Richards equation works: spectral analysis and computational experiments at very large scale

Author: Bertaccini Daniele
D'Ambra Pasqua
Durastante Fabio
Filippone Salvatore
Publication venue
Publication date: 15/07/2022
Field of study

We consider here a cell-centered finite difference approximation of the Richards equation in three dimensions, averaging for interface values the hydraulic conductivity

K=K(p)

, a highly nonlinear function, by arithmetic, upstream, and harmonic means. The nonlinearities in the equation can lead to changes in soil conductivity over several orders of magnitude and discretizations with respect to space variables often produce stiff systems of differential equations. A fully implicit time discretization is provided by \emph{backward Euler} one-step formula; the resulting nonlinear algebraic system is solved by an inexact Newton Armijo-Goldstein algorithm, requiring the solution of a sequence of linear systems involving Jacobian matrices. We prove some new results concerning the distribution of the Jacobians eigenvalues and the explicit expression of their entries. Moreover, we explore some connections between the saturation of the soil and the ill-conditioning of the Jacobians. The information on eigenvalues justifies the effectiveness of some preconditioner approaches which are widely used in the solution of Richards equation. We also propose a new software framework to experiment with scalable and robust preconditioners suitable for efficient parallel simulations at very large scales. Performance results on a literature test case show that our framework is very promising in the advance towards realistic simulations at extreme scale

arXiv.org e-Print Archive

In search for internal complexation in cyclodextrin-fullerene conjugates

Author: André Rassat
Salvatore Filippone
Publication venue
Publication date: 01/01/2003
Field of study

Comptes Rendus Chimie