Search CORE

4,875 research outputs found

PPF - A Parallel Particle Filtering Library

Author: Demirel Ömer
Meijering Erik
Niessen Wiro
Sbalzarini Ivo F.
Smal Ihor
Publication venue
Publication date: 01/01/2014
Field of study

We present the parallel particle filtering (PPF) software library, which enables hybrid shared-memory/distributed-memory parallelization of particle filtering (PF) algorithms combining the Message Passing Interface (MPI) with multithreading for multi-level parallelism. The library is implemented in Java and relies on OpenMPI's Java bindings for inter-process communication. It includes dynamic load balancing, multi-thread balancing, and several algorithmic improvements for PF, such as input-space domain decomposition. The PPF library hides the difficulties of efficient parallel programming of PF algorithms and provides application developers with the necessary tools for parallel implementation of PF methods. We demonstrate the capabilities of the PPF library using two distributed PF algorithms in two scenarios with different numbers of particles. The PPF library runs a 38 million particle problem, corresponding to more than 1.86 GB of particle data, on 192 cores with 67% parallel efficiency. To the best of our knowledge, the PPF library is the first open-source software that offers a parallel framework for PF applications.Comment: 8 pages, 8 figures; will appear in the proceedings of the IET Data Fusion & Target Tracking Conference 201

arXiv.org e-Print Archive

EUR Research Repository

MPG.PuRe

Enhancing Energy Production with Exascale HPC Methods

Author: Camata José J.
Cela José M.
Costa Danilo
Coutinho Alvaro LGA
Fernández-Galisteo Daniel
Jiménez Carmen
Kourdioumov Vadim
Mattoso Marta
Mayo-García Rafael
Miras Thomas
Moríñigo José A.
Navarro Jorge
Navaux Philippe O.A.
Oliveira Daniel de
Rodríguez-Pascual Manuel
Silva Vítor
Souza Renan
Valduriez Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the Intel Corporation, which enabled us to obtain the presented experimental results in uncertainty quantification in seismic imagingPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-Rennes 1

Grid enabling legacy applications for scalability – Experiences of a production application on the UK NGS

Author: Fowler R
Pakhira A
Perring T
Sastry L
Publication venue
Publication date: 01/01/2005
Field of study

ePubs: the open archive for STFC research publications

ParaSCAN: A Static Profiler to Help Parallelization

Author: Rajan Hridesh
Sondag Tyler
Upadhyaya Ganesha
Publication venue: Iowa State University Digital Repository
Publication date: 13/05/2014
Field of study

Parallelizing software often starts by profiling to identify program paths that are worth parallelizing. Static profiling techniques, e.g. hot paths, can be used to identify parallelism opportunities for programs that lack representative inputs and in situations where dynamic techniques aren\u27t applicable, e.g. parallelizing compilers and refactoring tools. Existing static techniques for identification of hot paths rely on path frequencies. Relying on path frequencies alone isn\u27t sufficient for identifying parallelism opportunities. We propose a novel automated approach for static profiling that combines both path frequencies and computational weight of the paths. We apply our technique called ParaSCAN to parallelism recommendation, where it is highly effective. Our results demonstrate that ParaSCAN\u27s recommendations cover all the parallelism manually identified by experts with 85% accuracy and in some cases also identifies parallelism missed by the experts

Digital Repository @ Iowa State University (ISU)

TensorFlow Doing HPC

Author: Bulatov Yaroslav
Chien Steven W. D.
Laure Erwin
Markidis Stefano
Olshevsky Vyacheslav
Vetter Jeffrey S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/03/2019
Field of study

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.Comment: Accepted for publication at The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES'19

arXiv.org e-Print Archive

Crossref