282 research outputs found
Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines
In this paper, we address the problem of efficient execution of a computation
pattern, referred to here as the irregular wavefront propagation pattern
(IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in
several image processing operations. In the IWPP, data elements in the
wavefront propagate waves to their neighboring elements on a grid if a
propagation condition is satisfied. Elements receiving the propagated waves
become part of the wavefront. This pattern results in irregular data accesses
and computations. We develop and evaluate strategies for efficient computation
and propagation of wavefronts using a multi-level queue structure. This queue
structure improves the utilization of fast memories in a GPU and reduces
synchronization overheads. We also develop a tile-based parallelization
strategy to support execution on multiple CPUs and GPUs. We evaluate our
approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs
and 2 multicore CPUs) using the IWPP implementations of two widely used image
processing operations: morphological reconstruction and euclidean distance
transform. Our results show significant performance improvements on GPUs. The
use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x
with respect to single core CPU executions for morphological reconstruction and
euclidean distance transform, respectively.Comment: 37 pages, 16 figure
Real-time Batched Distance Computation for Time-Optimal Safe Path Tracking
In human-robot collaboration, there has been a trade-off relationship between
the speed of collaborative robots and the safety of human workers. In our
previous paper, we introduced a time-optimal path tracking algorithm designed
to maximize speed while ensuring safety for human workers. This algorithm runs
in real-time and provides the safe and fastest control input for every cycle
with respect to ISO standards. However, true optimality has not been achieved
due to inaccurate distance computation resulting from conservative model
simplification. To attain true optimality, we require a method that can compute
distances 1. at many robot configurations to examine along a trajectory 2. in
real-time for online robot control 3. as precisely as possible for optimal
control. In this paper, we propose a batched, fast and precise distance
checking method based on precomputed link-local SDFs. Our method can check
distances for 500 waypoints along a trajectory within less than 1 millisecond
using a GPU at runtime, making it suited for time-critical robotic control.
Additionally, a neural approximation has been proposed to accelerate
preprocessing by a factor of 2. Finally, we experimentally demonstrate that our
method can navigate a 6-DoF robot earlier than a geometric-primitives-based
distance checker in a dynamic and collaborative environment
Accelerating incoherent dedispersion
Incoherent dedispersion is a computationally intensive problem that appears
frequently in pulsar and transient astronomy. For current and future transient
pipelines, dedispersion can dominate the total execution time, meaning its
computational speed acts as a constraint on the quality and quantity of science
results. It is thus critical that the algorithm be able to take advantage of
trends in commodity computing hardware. With this goal in mind, we present
analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with
respect to their potential for efficient execution on modern graphics
processing units (GPUs). We find all three to be excellent candidates, and
proceed to describe implementations in C for CUDA using insight gained from the
analysis. Using recent CPU and GPU hardware, the transition to the GPU provides
a speed-up of 9x for the direct algorithm when compared to an optimised
quad-core CPU code. For realistic recent survey parameters, these speeds are
high enough that further optimisation is unnecessary to achieve real-time
processing. Where further speed-ups are desirable, we find that the tree and
sub-band algorithms are able to provide 3-7x better performance at the cost of
certain smearing, memory consumption and development time trade-offs. We finish
with a discussion of the implications of these results for future transient
surveys. Our GPU dedispersion code is publicly available as a C library at:
http://dedisp.googlecode.com/Comment: 15 pages, 4 figures, 2 tables, accepted for publication in MNRA
Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments
Recent work has demonstrated real-time mapping and reconstruction from dense
perception, while motion planning based on distance fields has been shown to
achieve fast, collision-free motion synthesis with good convergence properties.
However, demonstration of a fully integrated system that can safely re-plan in
unknown environments, in the presence of static and dynamic obstacles, has
remained an open challenge. In this work, we first study the impact that signed
and unsigned distance fields have on optimisation convergence, and the
resultant error cost in trajectory optimisation problems in 2D path planning,
arm manipulator motion planning, and whole-body loco-manipulation planning. We
further analyse the performance of three state-of-the-art approaches to
generating distance fields (Voxblox, Fiesta, and GPU-Voxels) for use in
real-time environment reconstruction. Finally, we use our findings to construct
a practical hybrid mapping and motion planning system which uses GPU-Voxels and
GPMP2 to perform receding-horizon whole-body motion planning that can smoothly
avoid moving obstacles in 3D space using live sensor data. Our results are
validated in simulation and on a real-world Toyota Human Support Robot (HSR).Comment: 8 pages, 4 figures, 2 tables, submitted to IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS
A unified framework for isotropic meshing based on narrow-band Euclidean distance transformation
In this paper, we propose a simple-yet-effective method for isotropic meshing relying on Euclidean distance transformation based centroidal Voronoi tessellation (CVT). Our approach improves the performance and robustness of computing CVT on curved domains while simultaneously providing high-quality output meshes. While conventional extrinsic methods compute CVTs in the entire volume bounded by the input model, we restrict the computation to a 3D shell of user-controlled thickness. Taking voxels which contain surface samples as sites, we compute the exact Euclidean distance transform on the GPU. Our algorithm is parallel and memory-efficient, and can construct the shell space for resolutions up to 20483 at interactive speed. The 3D centroidal Voronoi tessellation and restricted Voronoi diagrams are also computed efficiently on the GPU. Since the shell space can bridge holes and gaps smaller than a certain tolerance, and tolerate non-manifold edges and degenerate triangles, our algorithm can handle models with such defects, which typically cause conventional remeshing methods to fail. Our method can process implicit surfaces, polyhedral surfaces, and point clouds in a unified framework. Computational results show that our GPU-based isotropic meshing algorithm produces results comparable to state-of- the-art techniques, but is significantly faster than conventional CPU-based implementations.MOE (Min. of Education, S’pore)Published versio
Recommended from our members
Perceptual models for high-refresh-rate rendering
Rendering realistic images requires substantial computational power. With new high-refresh-rate displays as well as the renaissance of virtual reality (VR) and augmented reality (AR), one cannot expect that GPU performance will scale fast enough to meet the requirements of immersive photo-realistic rendering with current rendering techniques.
In this dissertation, I follow the dual of the well-known computer vision approach: vision is inverse graphics: to improve graphical algorithms, I consider the operation of the human visual system. I propose to model and exploit the limitations of the visual system in the context of novel high-refresh-rate displays; specifically, I focus on spatio-temporal perception, a topic that has received remarkably less attention than spatial-only perception so far.
I present three main contributions. First, I demonstrate the validity of the perceptual approach by presenting a conceptually simple rendering technique motivated by our eyes' limited sensitivity to high spatio-temporal change which reduces the rendering load and transmission requirement of current-generation VR headsets without introducing perceivable visual artefacts. Second, I present two visual models related to motion perception: (a) a metric for detecting flicker; and (b) a comprehensive visual model to predict perceived motion quality on monitors with arbitrary refresh rates and monitor resolutions. Third, I propose an adaptive rendering algorithm that utilises the proposed models. All algorithms operate on physical colorimetric units (instead of display-referenced pixel values), for which I provide the appropriate display measurements and models. All proposed algorithms and visual models are calibrated and validated with psychophysical experiments
Graph Edge Bundling by Medial Axes
We present a new method for bundling edges of general graphs, based on 2D medial axes of edge sets which are similar in terms of position. We combine edge clustering, distance fields, and 2D medial axes to progressively bundle general graphs by attract-ing edges towards the centerlines of level sets of their distance fields. Our method allows for an efficient GPU implementation. We illustrate our method on several large real-world graphs
- …