1,004 research outputs found
PPF - A Parallel Particle Filtering Library
We present the parallel particle filtering (PPF) software library, which
enables hybrid shared-memory/distributed-memory parallelization of particle
filtering (PF) algorithms combining the Message Passing Interface (MPI) with
multithreading for multi-level parallelism. The library is implemented in Java
and relies on OpenMPI's Java bindings for inter-process communication. It
includes dynamic load balancing, multi-thread balancing, and several
algorithmic improvements for PF, such as input-space domain decomposition. The
PPF library hides the difficulties of efficient parallel programming of PF
algorithms and provides application developers with the necessary tools for
parallel implementation of PF methods. We demonstrate the capabilities of the
PPF library using two distributed PF algorithms in two scenarios with different
numbers of particles. The PPF library runs a 38 million particle problem,
corresponding to more than 1.86 GB of particle data, on 192 cores with 67%
parallel efficiency. To the best of our knowledge, the PPF library is the first
open-source software that offers a parallel framework for PF applications.Comment: 8 pages, 8 figures; will appear in the proceedings of the IET Data
Fusion & Target Tracking Conference 201
Parallelized Particle and Gaussian Sum Particle Filters for Large Scale Freeway Traffic Systems
Large scale traffic systems require techniques able to: 1) deal with high amounts of data and heterogenous data coming from different types of sensors, 2) provide robustness in the presence of sparse sensor data, 3) incorporate different models that can deal with various traffic regimes, 4) cope with multimodal conditional probability density functions for the states. Often centralized architectures face challenges due to high communication demands. This paper develops new estimation techniques able to cope with these problems of large traffic network systems. These are Parallelized Particle Filters (PPFs) and a Parallelized Gaussian Sum Particle Filter (PGSPF) that are suitable for on-line traffic management. We show how complex probability density functions of the high dimensional trafc state can be decomposed into functions with simpler forms and the whole estimation problem solved in an efcient way. The proposed approach is general, with limited interactions which reduces the computational time and provides high estimation accuracy. The efciency of the PPFs and PGSPFs is evaluated in terms of accuracy, complexity and communication demands and compared with the case where all processing is centralized
An Ultra Fast Image Generator (UFig) for wide-field astronomy
Simulated wide-field images are becoming an important part of observational
astronomy, either to prepare for new surveys or to test measurement methods. In
order to efficiently explore vast parameter spaces, the computational speed of
simulation codes is a central requirement to their implementation. We introduce
the Ultra Fast Image Generator (UFig) which aims to bring wide-field imaging
simulations to the current limits of computational capabilities. We achieve
this goal through: (1) models of galaxies, stars and observational conditions,
which, while simple, capture the key features necessary for realistic
simulations, and (2) state-of-the-art computational and implementation
optimizations. We present the performances of UFig and show that it is faster
than existing public simulation codes by several orders of magnitude. It allows
us to produce images more quickly than SExtractor needs to analyze them. For
instance, it can simulate a typical 0.25 deg^2 Subaru SuprimeCam image (10k x
8k pixels) with a 5-sigma limiting magnitude of R=26 in 30 seconds on a laptop,
yielding an average simulation time for a galaxy of 30 microseconds. This code
is complementary to end-to-end simulation codes and can be used as a fast,
central component of observational methods relying on simulations.Comment: Submitted to Astronomy and Computing. 13 pages, 9 figure
Speeding Up Particle Filter Algorithm for Tracking Multiple Targets Using CUDA Programming
This thesis proposes to work on a parallelization method to speed up the computational runtime of the particle filter algorithm for multiple targets tracking. CUDA programming is utilized to execute the original implementation of the particle filter algorithm on GPU. The thesis provides a detailed discussion of the background information on the relevant topics. And then a presentation of the code architecture changes is followed. The detailed CUDA-based implementation is illustrated and discussed, which is followed by a discussion andcomparison of the results obtained from a series of tests.In this thesis, the introduction and description of the basic particle filter are presented first. Detailed illustrations of each step in the original implementation of the particle filter algorithm, which is executed sequentially on CPU, are provided. Then, background information of parallel programming technologies is provided, such as GPGPU and CUDA programming. The new design of the CUDA based implementation of the particle filter algorithm is proposed to speed up the execution of the original implementation, which is executed on CPU. Moreover, a detailed explanation of the CUDA-based implementation is given.Finally, the thesis will demonstrate the test results for both CPU and CUDA implementation as a comparison. The experiments indicate that the CUDA implementation can obtain a maximum of 7.5x speedup over the original implementation. After implementing more results and comparison, it was concluded that the CUDA implementation was significantly faster than the CPU version. Furthermore, the CUDA version still has much space for future optimizations to increase its performance
A Parallel Histogram-based Particle Filter for Object Tracking on SIMD-based Smart Cameras
We present a parallel implementation of a histogram-based particle filter for object tracking on smart cameras based on SIMD processors. We specifically focus on parallel computation of the particle weights and parallel construction of the feature histograms since these are the major bottlenecks in standard implementations of histogram-based particle filters. The proposed algorithm can be applied with any histogram-based feature sets—we show in detail how the parallel particle filter can employ simple color histograms as well as more complex histograms of oriented gradients (HOG). The algorithm was successfully implemented on an SIMD processor and performs robust object tracking at up to 30 frames per second—a performance difficult to achieve even on a modern desktop computer
Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications
Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task.
Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems
Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU
High-performance GPU-accelerated particle filter methods are critical for
object detection applications, ranging from autonomous driving, robot
localization, to time-series prediction. In this work, we investigate the
design, development and optimization of particle-filter using half-precision on
CUDA cores and compare their performance and accuracy with single- and
double-precision baselines on Nvidia V100, A100, A40 and T4 GPUs. To mitigate
numerical instability and precision losses, we introduce algorithmic changes in
the particle filters. Using half-precision leads to a performance improvement
of 1.5-2x and 2.5-4.6x with respect to single- and double-precision baselines
respectively, at the cost of a relatively small loss of accuracy.Comment: 12 pages, 8 figures, conference. To be published in The 21st
International Workshop on Algorithms, Models and Tools for Parallel Computing
on Heterogeneous Platforms (HeteroPar2023
- …