7,966 research outputs found
Computing the Component-Labeling and the Adjacency Tree of a Binary Digital Image in Near Logarithmic-Time
Connected component labeling (CCL) of binary images is
one of the fundamental operations in real time applications. The adjacency
tree (AdjT) of the connected components offers a region-based
representation where each node represents a region which is surrounded
by another region of the opposite color. In this paper, a fully parallel
algorithm for computing the CCL and AdjT of a binary digital image
is described and implemented, without the need of using any geometric
information. The time complexity order for an image of m × n pixels
under the assumption that a processing element exists for each pixel is
near O(log(m+ n)). Results for a multicore processor show a very good
scalability until the so-called memory bandwidth bottleneck is reached.
The inherent parallelism of our approach points to the direction that
even better results will be obtained in other less classical computing
architectures.Ministerio de EconomÃa y Competitividad MTM2016-81030-PMinisterio de EconomÃa y Competitividad TEC2012-37868-C04-0
Parallel computing for the finite element method
A finite element method is presented to compute time harmonic microwave
fields in three dimensional configurations. Nodal-based finite elements have
been coupled with an absorbing boundary condition to solve open boundary
problems. This paper describes how the modeling of large devices has been made
possible using parallel computation, New algorithms are then proposed to
implement this formulation on a cluster of workstations (10 DEC ALPHA 300X) and
on a CRAY C98. Analysis of the computation efficiency is performed using simple
problems. The electromagnetic scattering of a plane wave by a perfect electric
conducting airplane is finally given as example
Parallel eigensolvers in plane-wave Density Functional Theory
We consider the problem of parallelizing electronic structure computations in
plane-wave Density Functional Theory. Because of the limited scalability of
Fourier transforms, parallelism has to be found at the eigensolver level. We
show how a recently proposed algorithm based on Chebyshev polynomials can scale
into the tens of thousands of processors, outperforming block conjugate
gradient algorithms for large computations
The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform
The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity and acceptance is raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), we introduce in this report two OpenMP parallelization methods. Due to the increasing importance of Arm architecture in the HPC scenario, we evaluate our HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a contribution to the Arm ecosystem: along with this technical report, we plan in fact to release our code for boosting the tuning of the HPCG benchmark within the Arm community.Postprint (author's final draft
Parallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines
Nowadays, several industrial applications are being ported to parallel
architectures. In fact, these platforms allow acquire more performance for
system modelling and simulation. In the electric machines area, there are many
problems which need speed-up on their solution. This paper examines the
parallelism of sparse matrix solver on the graphics processors. More
specifically, we implement the conjugate gradient technique with input matrix
stored in CSR, and Symmetric CSR and CSC formats. This method is one of the
most efficient iterative methods available for solving the finite-element basis
functions of Maxwell's equations. The GPU (Graphics Processing Unit), which is
used for its implementation, provides mechanisms to parallel the algorithm.
Thus, it increases significantly the computation speed in relation to serial
code on CPU based systems
- …