282 research outputs found

    Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

    Full text link
    In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.Comment: 37 pages, 16 figure

    Real-time Batched Distance Computation for Time-Optimal Safe Path Tracking

    Full text link
    In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect to ISO standards. However, true optimality has not been achieved due to inaccurate distance computation resulting from conservative model simplification. To attain true optimality, we require a method that can compute distances 1. at many robot configurations to examine along a trajectory 2. in real-time for online robot control 3. as precisely as possible for optimal control. In this paper, we propose a batched, fast and precise distance checking method based on precomputed link-local SDFs. Our method can check distances for 500 waypoints along a trajectory within less than 1 millisecond using a GPU at runtime, making it suited for time-critical robotic control. Additionally, a neural approximation has been proposed to accelerate preprocessing by a factor of 2. Finally, we experimentally demonstrate that our method can navigate a 6-DoF robot earlier than a geometric-primitives-based distance checker in a dynamic and collaborative environment

    Accelerating incoherent dedispersion

    Full text link
    Incoherent dedispersion is a computationally intensive problem that appears frequently in pulsar and transient astronomy. For current and future transient pipelines, dedispersion can dominate the total execution time, meaning its computational speed acts as a constraint on the quality and quantity of science results. It is thus critical that the algorithm be able to take advantage of trends in commodity computing hardware. With this goal in mind, we present analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with respect to their potential for efficient execution on modern graphics processing units (GPUs). We find all three to be excellent candidates, and proceed to describe implementations in C for CUDA using insight gained from the analysis. Using recent CPU and GPU hardware, the transition to the GPU provides a speed-up of 9x for the direct algorithm when compared to an optimised quad-core CPU code. For realistic recent survey parameters, these speeds are high enough that further optimisation is unnecessary to achieve real-time processing. Where further speed-ups are desirable, we find that the tree and sub-band algorithms are able to provide 3-7x better performance at the cost of certain smearing, memory consumption and development time trade-offs. We finish with a discussion of the implications of these results for future transient surveys. Our GPU dedispersion code is publicly available as a C library at: http://dedisp.googlecode.com/Comment: 15 pages, 4 figures, 2 tables, accepted for publication in MNRA

    Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments

    Get PDF
    Recent work has demonstrated real-time mapping and reconstruction from dense perception, while motion planning based on distance fields has been shown to achieve fast, collision-free motion synthesis with good convergence properties. However, demonstration of a fully integrated system that can safely re-plan in unknown environments, in the presence of static and dynamic obstacles, has remained an open challenge. In this work, we first study the impact that signed and unsigned distance fields have on optimisation convergence, and the resultant error cost in trajectory optimisation problems in 2D path planning, arm manipulator motion planning, and whole-body loco-manipulation planning. We further analyse the performance of three state-of-the-art approaches to generating distance fields (Voxblox, Fiesta, and GPU-Voxels) for use in real-time environment reconstruction. Finally, we use our findings to construct a practical hybrid mapping and motion planning system which uses GPU-Voxels and GPMP2 to perform receding-horizon whole-body motion planning that can smoothly avoid moving obstacles in 3D space using live sensor data. Our results are validated in simulation and on a real-world Toyota Human Support Robot (HSR).Comment: 8 pages, 4 figures, 2 tables, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

    A unified framework for isotropic meshing based on narrow-band Euclidean distance transformation

    Get PDF
    In this paper, we propose a simple-yet-effective method for isotropic meshing relying on Euclidean distance transformation based centroidal Voronoi tessellation (CVT). Our approach improves the performance and robustness of computing CVT on curved domains while simultaneously providing high-quality output meshes. While conventional extrinsic methods compute CVTs in the entire volume bounded by the input model, we restrict the computation to a 3D shell of user-controlled thickness. Taking voxels which contain surface samples as sites, we compute the exact Euclidean distance transform on the GPU. Our algorithm is parallel and memory-efficient, and can construct the shell space for resolutions up to 20483 at interactive speed. The 3D centroidal Voronoi tessellation and restricted Voronoi diagrams are also computed efficiently on the GPU. Since the shell space can bridge holes and gaps smaller than a certain tolerance, and tolerate non-manifold edges and degenerate triangles, our algorithm can handle models with such defects, which typically cause conventional remeshing methods to fail. Our method can process implicit surfaces, polyhedral surfaces, and point clouds in a unified framework. Computational results show that our GPU-based isotropic meshing algorithm produces results comparable to state-of- the-art techniques, but is significantly faster than conventional CPU-based implementations.MOE (Min. of Education, S’pore)Published versio

    Fundamental Computational Geometry on the GPU

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A GPU-Based Algorithm for the Generation of Spherical Voronoi Diagram in QTM mode

    Get PDF

    Graph Edge Bundling by Medial Axes

    Get PDF
    We present a new method for bundling edges of general graphs, based on 2D medial axes of edge sets which are similar in terms of position. We combine edge clustering, distance fields, and 2D medial axes to progressively bundle general graphs by attract-ing edges towards the centerlines of level sets of their distance fields. Our method allows for an efficient GPU implementation. We illustrate our method on several large real-world graphs
    corecore