531 research outputs found

    Algorithms for Efficient Computation of Convolution

    Get PDF

    Parallel resampling in the particle filter

    Full text link
    Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation.Comment: 21 pages, 6 figure

    Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications

    Get PDF
    Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task. Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems

    Compiling High Performance Recursive Filters

    Get PDF
    International audienceInfinite impulse response (IIR) or recursive filters, are essential for image processing because they turn expensive large-footprint convolutions into operations that have a constant cost per pixel regardless of kernel size. However, their recursive nature constrains the order in which pixels can be computed, severely limiting both parallelism within a filter and memory locality across multiple filters.Prior research has developed algorithms that can compute IIR filters with image tiles. Using a divide-and-recombine strategy inspired by parallel prefix sum, they expose greater parallelism and exploit producer-consumer locality in pipelines of IIR filters over multi-dimensional images. While the principles are simple, it is hard, given a recursive filter, to derive a corresponding tile-parallel algorithm, and even harder to implement and debug it.We show that parallel and locality-aware implementations of IIR filter pipelines can be obtained through {\em program transformations}, which we mechanize through a {\em domain-specific compiler.} We show that the composition of a small set of transformations suffices to cover the space of possible strategies. We also demonstrate that the tiled implementations can be automatically scheduled in hardware-specific manners using a small set of generic heuristics. The programmer specifies the basic recursive filters, and the choice of transformation requires only a few lines of code. Our compiler then generates high-performance implementations that are an order of magnitude faster than standard GPU implementations, and outperform hand tuned tiled implementations of specialized algorithms which require orders of magnitude more programming effort---a few lines of code instead of a few thousand lines per pipeline

    Calibrating Depth Sensors with a Genetic Algorithm

    Get PDF
    In this report, we deal with the optimization of the transformation estimate between the coordinate systems of depth sensors, \ie sensors that produce 3D measurements. For that, we present a novel method using a genetic algorithm to refine the six degrees of freedom (6 DoF) transformation via three rotational and three translational offsets. First, we demonstrate the necessity for an accurate depth sensor calibration using a depth error model of stereo cameras. The fusion of stereo disparity assumes a Gaussian disparity error distribution, which we examine with different stereo matching algorithms on the widely-used KITTI visual odometry dataset. Our analysis shows that the existing calibration is not adequate for accurate disparity fusion. As a consequence, we employ our genetic algorithm on this particular dataset, which results in a greatly improved calibration between the mounted stereo camera and the Lidar. Thus, stereo disparity estimates show improved results in quantitative evaluations

    Robust Localization in 3D Prior Maps for Autonomous Driving.

    Full text link
    In order to navigate autonomously, many self-driving vehicles require precise localization within an a priori known map that is annotated with exact lane locations, traffic signs, and additional metadata that govern the rules of the road. This approach transforms the extremely difficult and unpredictable task of online perception into a more structured localization problem—where exact localization in these maps provides the autonomous agent a wealth of knowledge for safe navigation. This thesis presents several novel localization algorithms that leverage a high-fidelity three-dimensional (3D) prior map that together provide a robust and reliable framework for vehicle localization. First, we present a generic probabilistic method for localizing an autonomous vehicle equipped with a 3D light detection and ranging (LIDAR) scanner. This proposed algorithm models the world as a mixture of several Gaussians, characterizing the z-height and reflectivity distribution of the environment—which we rasterize to facilitate fast and exact multiresolution inference. Second, we propose a visual localization strategy that replaces the expensive 3D LIDAR scanners with significantly cheaper, commodity cameras. In doing so, we exploit a graphics processing unit to generate synthetic views of our belief environment, resulting in a localization solution that achieves a similar order of magnitude error rate with a sensor that is several orders of magnitude cheaper. Finally, we propose a visual obstacle detection algorithm that leverages knowledge of our high-fidelity prior maps in its obstacle prediction model. This not only provides obstacle awareness at high rates for vehicle navigation, but also improves our visual localization quality as we are cognizant of static and non-static regions of the environment. All of these proposed algorithms are demonstrated to be real-time solutions for our self-driving car.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133410/1/rwolcott_1.pd
    • …
    corecore