50 research outputs found

    Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications

    Get PDF
    Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task. Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems

    Efficient Kalman Filtering and Smoothing

    Get PDF
    The Kalman filter and Kalman smoother are important components in modern multitarget tracking systems. Their application are vast which include guidance, navigation and control of vehicles. On top of that, when the motion model is uncertain, MultipleModel approach can be combined with the filtering and smoothing method. However, with large amount of retrodiction window size, number of motion models and large number of targets, this process can become very computationally intensive and thus time consuming. Very often, real-time processing is needed in the world of tracking and therefore, this computational bottleneck become a problem. This is the motivation behind this thesis, to reduce the computational complexity when multi-target, multiwindow or multi-model applications are used. This thesis presents several approaches to tackle this multi-dimensional problems in terms of complexity while maintaining satisfactory precision. A natural step forward will be in leveraging the modern multi-core architectures. However, in order to parallelize such process, these algorithms have to be reformulated to be fitted into the parallel processors. In order to parallelise multi-target and multi-window scenario, this thesis introduce nested parallelism and prefix-sum algorithm to tackle the problem and realised this on Intel Knights Landing (KNL) Processor and OpenMP memory model. On the other hand, in the case of limited parallel resources, this thesis also develop alternatives called Fast Kalman smoother (FRTS) to lower the computation complexity due to multi-window problem. Specifically the smoother algorithm is reformulated such that it is computationally independent of number of window size in the fixed-lag configuration. Although the underlying mathematics is the same as the conventional approach, FRTS introduced numerical stability issue which makes the smoother unstable. Therefore, this thesis introduce the idea of condition number to monitor the deterioration rate in order to correct the numerical error once the pre-set threshold is breached. In addition to the large number of targets and retrodiction window size mentioned earlier, the number of models running simultaneously make the problem even more challenging in the perspective of real-time performance. Since such algorithms are the fundamental backbone of a large amount of multi-frame tracking algorithms, it would be beneficial to have a multi-model algorithm that is computationally independent to number of model utilised. Consequently, this thesis extend the FRTS concept to fixed-lag Multiple-Model smoothing method to achieve this goal. The proposed algorithms are compared and tested through an extensive and exhaustive set of evaluations against the literature, and discuss the relative merits. These evaluations show that these contributions pave a way to secure substantial performance gains for multi-dimensional tracking algorithms over conventional approaches

    Algorithm design for parallel implementation of the SMC-PHD filter

    Get PDF
    The sequential Monte Carlo (SMC) implementation of the probability hypothesis density (PHD) filter suffers from low computational efficiency since a large number of particles are often required, especially when there are a large number of targets and dense clutter. In order to speed up the computation, an algorithmic framework for parallel SMC-PHD filtering based on multiple processors is proposed. The algorithm makes full parallelization of all four steps of the SMC-PHD filter and the computational load is approximately equal among parallel processors, rendering a high parallelization benefit when there are multiple targets and dense clutter. The parallelization is theoretically unbiased as it provides the same result as the serial implementation, without introducing any approximation. Experiments on multi-core computers have demonstrated that our parallel implementation has gained considerable speedup compared to the serial implementation of the same algorithm

    Hybrid Sampling Bayesian Occupancy Filter

    No full text
    International audienceModeling and monitoring dynamic environments is a complex task but is crucial in the field of intelligent vehicle. A traditional way of addressing these issues is the modeling of moving objects, through Detection And Tracking of Moving Objects (DATMO) methods. An alternative to a classic object model framework is the occupancy grid filtering domain. Instead of segmenting the scene into objects and track them, the environment is represented as a regular grid of occupancy, in which each cell is tracked at a sub-object level. The Bayesian Occupancy Filter is a generic occupancy grid framework which predicts the spread of spatial occupancy by estimating cell velocity distributions. However its velocity model, corresponding to a transition histogram per cell, leads to huge data management which in practice makes it hardly compatible to severe computational and hardware constraints, like in many embedded systems. In this paper, we present a new representation for the BOF, describing the environment through a mix of static and dynamic occupancy. This differentiation enables the use of a model adapted to the considered nature: static occupancy is described in a classic occupancy grid, while dynamic occupancy is modeled by a set of moving particles. Both static and dynamic parts are jointly generated and evaluated, their distribution over the cells being adjusted. This approach leads to a more compact model and to drastically improve the accuracy of the results, in particular in term of velocities. Experimental results show that the number of values required to model the velocities have been reduced from a typical 900 per cell (for a 30x30 neighborhood) to less than 2 per cell in average. The massive data compression allows to plan dedicated embedded devices

    Hybrid sampling Bayesian Occupancy Filter

    Full text link

    Hybrid Sampling Bayesian Occupancy Filter

    Get PDF
    International audienceModeling and monitoring dynamic environments is a complex task but is crucial in the field of intelligent vehicle. A traditional way of addressing these issues is the modeling of moving objects, through Detection And Tracking of Moving Objects (DATMO) methods. An alternative to a classic object model framework is the occupancy grid filtering domain. Instead of segmenting the scene into objects and track them, the environment is represented as a regular grid of occupancy, in which each cell is tracked at a sub-object level. The Bayesian Occupancy Filter is a generic occupancy grid framework which predicts the spread of spatial occupancy by estimating cell velocity distributions. However its velocity model, corresponding to a transition histogram per cell, leads to huge data management which in practice makes it hardly compatible to severe computational and hardware constraints, like in many embedded systems. In this paper, we present a new representation for the BOF, describing the environment through a mix of static and dynamic occupancy. This differentiation enables the use of a model adapted to the considered nature: static occupancy is described in a classic occupancy grid, while dynamic occupancy is modeled by a set of moving particles. Both static and dynamic parts are jointly generated and evaluated, their distribution over the cells being adjusted. This approach leads to a more compact model and to drastically improve the accuracy of the results, in particular in term of velocities. Experimental results show that the number of values required to model the velocities have been reduced from a typical 900 per cell (for a 30x30 neighborhood) to less than 2 per cell in average. The massive data compression allows to plan dedicated embedded devices

    An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

    Get PDF
    Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.</jats:p

    Algorithms and architectures for MCMC acceleration in FPGAs

    Get PDF
    Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces
    corecore