9 research outputs found

    Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture

    Get PDF
    This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on

    Numerical libraries solving large-scale problems developed at IT4Innovations Research Programme Supercomputing for Industry

    Get PDF
    The team of Research Programme Supercomputing for Industry at IT4Innovations National Supercomputing Center is focused on development of highly scalable algorithms for solution of linear and non-linear problems arising from different engineering applications. As a main parallelisation technique, domain decomposition methods (DDM) of FETI type are used. These methods are combined with finite element (FEM) or boundary element (BEM) discretisation methods and quadratic programming (QP) algorithms. All these algorithms were implemented into our in-house software packages BEM4I, ESPRESO and PERMON, which demonstrate high scalability up to tens of thousands of cores

    Scalable Flow Simulations with the Lattice Boltzmann Method

    Full text link
    The primary goal of the EuroHPC JU project SCALABLE is to develop an industrial Lattice Boltzmann Method (LBM)-based computational fluid dynamics (CFD) solver capable of exploiting current and future extreme scale architectures, expanding current capabilities of existing industrial LBM solvers by at least two orders of magnitude in terms of processor cores and lattice cells, while preserving its accessibility from both the end-user and software developer's point of view. This is accomplished by transferring technology and knowledge between an academic code (waLBerla) and an industrial code (LaBS). This paper briefly introduces the characteristics and main features of both software packages involved in the process. We also highlight some of the performance achievements in scales of up to tens of thousand of cores presented on one academic and one industrial benchmark case
    corecore