1,004 research outputs found

    PPF - A Parallel Particle Filtering Library

    Full text link
    We present the parallel particle filtering (PPF) software library, which enables hybrid shared-memory/distributed-memory parallelization of particle filtering (PF) algorithms combining the Message Passing Interface (MPI) with multithreading for multi-level parallelism. The library is implemented in Java and relies on OpenMPI's Java bindings for inter-process communication. It includes dynamic load balancing, multi-thread balancing, and several algorithmic improvements for PF, such as input-space domain decomposition. The PPF library hides the difficulties of efficient parallel programming of PF algorithms and provides application developers with the necessary tools for parallel implementation of PF methods. We demonstrate the capabilities of the PPF library using two distributed PF algorithms in two scenarios with different numbers of particles. The PPF library runs a 38 million particle problem, corresponding to more than 1.86 GB of particle data, on 192 cores with 67% parallel efficiency. To the best of our knowledge, the PPF library is the first open-source software that offers a parallel framework for PF applications.Comment: 8 pages, 8 figures; will appear in the proceedings of the IET Data Fusion & Target Tracking Conference 201

    Parallelized Particle and Gaussian Sum Particle Filters for Large Scale Freeway Traffic Systems

    Get PDF
    Large scale traffic systems require techniques able to: 1) deal with high amounts of data and heterogenous data coming from different types of sensors, 2) provide robustness in the presence of sparse sensor data, 3) incorporate different models that can deal with various traffic regimes, 4) cope with multimodal conditional probability density functions for the states. Often centralized architectures face challenges due to high communication demands. This paper develops new estimation techniques able to cope with these problems of large traffic network systems. These are Parallelized Particle Filters (PPFs) and a Parallelized Gaussian Sum Particle Filter (PGSPF) that are suitable for on-line traffic management. We show how complex probability density functions of the high dimensional trafc state can be decomposed into functions with simpler forms and the whole estimation problem solved in an efcient way. The proposed approach is general, with limited interactions which reduces the computational time and provides high estimation accuracy. The efciency of the PPFs and PGSPFs is evaluated in terms of accuracy, complexity and communication demands and compared with the case where all processing is centralized

    An Ultra Fast Image Generator (UFig) for wide-field astronomy

    Get PDF
    Simulated wide-field images are becoming an important part of observational astronomy, either to prepare for new surveys or to test measurement methods. In order to efficiently explore vast parameter spaces, the computational speed of simulation codes is a central requirement to their implementation. We introduce the Ultra Fast Image Generator (UFig) which aims to bring wide-field imaging simulations to the current limits of computational capabilities. We achieve this goal through: (1) models of galaxies, stars and observational conditions, which, while simple, capture the key features necessary for realistic simulations, and (2) state-of-the-art computational and implementation optimizations. We present the performances of UFig and show that it is faster than existing public simulation codes by several orders of magnitude. It allows us to produce images more quickly than SExtractor needs to analyze them. For instance, it can simulate a typical 0.25 deg^2 Subaru SuprimeCam image (10k x 8k pixels) with a 5-sigma limiting magnitude of R=26 in 30 seconds on a laptop, yielding an average simulation time for a galaxy of 30 microseconds. This code is complementary to end-to-end simulation codes and can be used as a fast, central component of observational methods relying on simulations.Comment: Submitted to Astronomy and Computing. 13 pages, 9 figure

    Speeding Up Particle Filter Algorithm for Tracking Multiple Targets Using CUDA Programming

    Get PDF
    This thesis proposes to work on a parallelization method to speed up the computational runtime of the particle filter algorithm for multiple targets tracking. CUDA programming is utilized to execute the original implementation of the particle filter algorithm on GPU. The thesis provides a detailed discussion of the background information on the relevant topics. And then a presentation of the code architecture changes is followed. The detailed CUDA-based implementation is illustrated and discussed, which is followed by a discussion andcomparison of the results obtained from a series of tests.In this thesis, the introduction and description of the basic particle filter are presented first. Detailed illustrations of each step in the original implementation of the particle filter algorithm, which is executed sequentially on CPU, are provided. Then, background information of parallel programming technologies is provided, such as GPGPU and CUDA programming. The new design of the CUDA based implementation of the particle filter algorithm is proposed to speed up the execution of the original implementation, which is executed on CPU. Moreover, a detailed explanation of the CUDA-based implementation is given.Finally, the thesis will demonstrate the test results for both CPU and CUDA implementation as a comparison. The experiments indicate that the CUDA implementation can obtain a maximum of 7.5x speedup over the original implementation. After implementing more results and comparison, it was concluded that the CUDA implementation was significantly faster than the CPU version. Furthermore, the CUDA version still has much space for future optimizations to increase its performance

    A Parallel Histogram-based Particle Filter for Object Tracking on SIMD-based Smart Cameras

    Get PDF
    We present a parallel implementation of a histogram-based particle filter for object tracking on smart cameras based on SIMD processors. We specifically focus on parallel computation of the particle weights and parallel construction of the feature histograms since these are the major bottlenecks in standard implementations of histogram-based particle filters. The proposed algorithm can be applied with any histogram-based feature sets—we show in detail how the parallel particle filter can employ simple color histograms as well as more complex histograms of oriented gradients (HOG). The algorithm was successfully implemented on an SIMD processor and performs robust object tracking at up to 30 frames per second—a performance difficult to achieve even on a modern desktop computer

    Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications

    Get PDF
    Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task. Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems

    Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU

    Full text link
    High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on Nvidia V100, A100, A40 and T4 GPUs. To mitigate numerical instability and precision losses, we introduce algorithmic changes in the particle filters. Using half-precision leads to a performance improvement of 1.5-2x and 2.5-4.6x with respect to single- and double-precision baselines respectively, at the cost of a relatively small loss of accuracy.Comment: 12 pages, 8 figures, conference. To be published in The 21st International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar2023
    corecore