10,507 research outputs found
Efficient Generation of Parallel Spin-images Using Dynamic Loop Scheduling
High performance computing (HPC) systems underwent a significant increase in
their processing capabilities. Modern HPC systems combine large numbers of
homogeneous and heterogeneous computing resources. Scalability is, therefore,
an essential aspect of scientific applications to efficiently exploit the
massive parallelism of modern HPC systems. This work introduces an efficient
version of the parallel spin-image algorithm (PSIA), called EPSIA. The PSIA is
a parallel version of the spin-image algorithm (SIA). The (P)SIA is used in
various domains, such as 3D object recognition, categorization, and 3D face
recognition. EPSIA refers to the extended version of the PSIA that integrates
various well-known dynamic loop scheduling (DLS) techniques. The present work:
(1) Proposes EPSIA, a novel flexible version of PSIA; (2) Showcases the
benefits of applying DLS techniques for optimizing the performance of the PSIA;
(3) Assesses the performance of the proposed EPSIA by conducting several
scalability experiments. The performance results are promising and show that
using well-known DLS techniques, the performance of the EPSIA outperforms the
performance of the PSIA by a factor of 1.2 and 2 for homogeneous and
heterogeneous computing resources, respectively
Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling
Though the GPGPU concept is well-known
in image processing, much more work remains to be done
to fully exploit GPUs as an alternative computation
engine. This paper investigates the computation-to-core
mapping strategies to probe the efficiency and scalability
of the robust facet image modeling algorithm on GPUs.
Our fine-grained computation-to-core mapping scheme
shows a significant performance gain over the standard
pixel-wise mapping scheme. With in-depth performance
comparisons across the two different mapping schemes,
we analyze the impact of the level of parallelism on
the GPU computation and suggest two principles for
optimizing future image processing applications on the
GPU platform
Scalable software architecture for on-line multi-camera video processing
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhea
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
PPF - A Parallel Particle Filtering Library
We present the parallel particle filtering (PPF) software library, which
enables hybrid shared-memory/distributed-memory parallelization of particle
filtering (PF) algorithms combining the Message Passing Interface (MPI) with
multithreading for multi-level parallelism. The library is implemented in Java
and relies on OpenMPI's Java bindings for inter-process communication. It
includes dynamic load balancing, multi-thread balancing, and several
algorithmic improvements for PF, such as input-space domain decomposition. The
PPF library hides the difficulties of efficient parallel programming of PF
algorithms and provides application developers with the necessary tools for
parallel implementation of PF methods. We demonstrate the capabilities of the
PPF library using two distributed PF algorithms in two scenarios with different
numbers of particles. The PPF library runs a 38 million particle problem,
corresponding to more than 1.86 GB of particle data, on 192 cores with 67%
parallel efficiency. To the best of our knowledge, the PPF library is the first
open-source software that offers a parallel framework for PF applications.Comment: 8 pages, 8 figures; will appear in the proceedings of the IET Data
Fusion & Target Tracking Conference 201
- …