Search CORE

282 research outputs found

Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

Author: Cooper Lee
Kong Jun
Kurc Tahsin
Pan Tony
Saltz Joel
Teodoro George
Publication venue
Publication date: 14/09/2012
Field of study

In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.Comment: 37 pages, 16 figure

arXiv.org e-Print Archive

CiteSeerX

Real-time Batched Distance Computation for Time-Optimal Safe Path Tracking

Author: Fujii Shohei
Pham Quang-Cuong
Publication venue
Publication date: 21/09/2023
Field of study

In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect to ISO standards. However, true optimality has not been achieved due to inaccurate distance computation resulting from conservative model simplification. To attain true optimality, we require a method that can compute distances 1. at many robot configurations to examine along a trajectory 2. in real-time for online robot control 3. as precisely as possible for optimal control. In this paper, we propose a batched, fast and precise distance checking method based on precomputed link-local SDFs. Our method can check distances for 500 waypoints along a trajectory within less than 1 millisecond using a GPU at runtime, making it suited for time-critical robotic control. Additionally, a neural approximation has been proposed to accelerate preprocessing by a factor of 2. Finally, we experimentally demonstrate that our method can navigate a 6-DoF robot earlier than a geometric-primitives-based distance checker in a dynamic and collaborative environment

arXiv.org e-Print Archive

Accelerating incoherent dedispersion

Author: B. R. Barsdell
Barsdell
Bhattacharya
C. J. Fluke
Cordes
D. G. Barnes
Fluke
Gaensler
Keane
Keith
Lattimer
Lorimer
Lyne
M. Bailes
Magro
Manchester
Manchester
McLaughlin
Taylor
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Incoherent dedispersion is a computationally intensive problem that appears frequently in pulsar and transient astronomy. For current and future transient pipelines, dedispersion can dominate the total execution time, meaning its computational speed acts as a constraint on the quality and quantity of science results. It is thus critical that the algorithm be able to take advantage of trends in commodity computing hardware. With this goal in mind, we present analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with respect to their potential for efficient execution on modern graphics processing units (GPUs). We find all three to be excellent candidates, and proceed to describe implementations in C for CUDA using insight gained from the analysis. Using recent CPU and GPU hardware, the transition to the GPU provides a speed-up of 9x for the direct algorithm when compared to an optimised quad-core CPU code. For realistic recent survey parameters, these speeds are high enough that further optimisation is unnecessary to achieve real-time processing. Where further speed-ups are desirable, we find that the tree and sub-band algorithms are able to provide 3-7x better performance at the cost of certain smearing, memory consumption and development time trade-offs. We finish with a discussion of the implications of these results for future transient surveys. Our GPU dedispersion code is publicly available as a C library at: http://dedisp.googlecode.com/Comment: 15 pages, 4 figures, 2 tables, accepted for publication in MNRA

arXiv.org e-Print Archive

Crossref

Swinburne Research Bank

Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments

Author: Finean Mark Nicholas
Havoutis Ioannis
Merkt Wolfgang
Publication venue
Publication date: 05/03/2021
Field of study

Recent work has demonstrated real-time mapping and reconstruction from dense perception, while motion planning based on distance fields has been shown to achieve fast, collision-free motion synthesis with good convergence properties. However, demonstration of a fully integrated system that can safely re-plan in unknown environments, in the presence of static and dynamic obstacles, has remained an open challenge. In this work, we first study the impact that signed and unsigned distance fields have on optimisation convergence, and the resultant error cost in trajectory optimisation problems in 2D path planning, arm manipulator motion planning, and whole-body loco-manipulation planning. We further analyse the performance of three state-of-the-art approaches to generating distance fields (Voxblox, Fiesta, and GPU-Voxels) for use in real-time environment reconstruction. Finally, we use our findings to construct a practical hybrid mapping and motion planning system which uses GPU-Voxels and GPMP2 to perform receding-horizon whole-body motion planning that can smoothly avoid moving obstacles in 3D space using live sensor data. Our results are validated in simulation and on a real-world Toyota Human Support Robot (HSR).Comment: 8 pages, 4 figures, 2 tables, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

arXiv.org e-Print Archive

Oxford University Research Archive

A unified framework for isotropic meshing based on narrow-band Euclidean distance transformation

Author: B. Lévy
C. C. L. Wang
C. Xu
C. Xu
D. Yan
D.-M. Yan
G. Rong
G. Rong
H. Li
H. Sheung
J. Chen
J. S. B. Mitchell
K. E. Hoff
L. Lu
L. Shuai
M. Campen
M. M. Kazhdan
M. W. Jones
N. Amenta
N. Amenta
P. Alliez
Q. Du
R. Kimmel
R. Satherley
S. Lloyd
S. Marchand-Maillet
T. K. Dey
T.-T. Cao
X. Wang
Y. J. Liu
Z. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this paper, we propose a simple-yet-effective method for isotropic meshing relying on Euclidean distance transformation based centroidal Voronoi tessellation (CVT). Our approach improves the performance and robustness of computing CVT on curved domains while simultaneously providing high-quality output meshes. While conventional extrinsic methods compute CVTs in the entire volume bounded by the input model, we restrict the computation to a 3D shell of user-controlled thickness. Taking voxels which contain surface samples as sites, we compute the exact Euclidean distance transform on the GPU. Our algorithm is parallel and memory-efficient, and can construct the shell space for resolutions up to 20483 at interactive speed. The 3D centroidal Voronoi tessellation and restricted Voronoi diagrams are also computed efficiently on the GPU. Since the shell space can bridge holes and gaps smaller than a certain tolerance, and tolerate non-manifold edges and degenerate triangles, our algorithm can handle models with such defects, which typically cause conventional remeshing methods to fail. Our method can process implicit surfaces, polyhedral surfaces, and point clouds in a unified framework. Computational results show that our GPU-based isotropic meshing algorithm produces results comparable to state-of- the-art techniques, but is significantly faster than conventional CPU-based implementations.MOE (Min. of Education, S’pore)Published versio

Crossref

Springer - Publisher Connector

The University of Manchester - Institutional Repository

DR-NTU (Digital Repository of NTU)

Fundamental Computational Geometry on the GPU

Author: CAO THANH TUNG
Publication venue
Publication date: 16/06/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

A GPU-Based Algorithm for the Generation of Spherical Voronoi Diagram in QTM mode

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

Recommended from our members

Perceptual models for high-refresh-rate rendering

Author: Dénes György
Publication venue: University of Cambridge
Publication date: 23/01/2020
Field of study

Rendering realistic images requires substantial computational power. With new high-refresh-rate displays as well as the renaissance of virtual reality (VR) and augmented reality (AR), one cannot expect that GPU performance will scale fast enough to meet the requirements of immersive photo-realistic rendering with current rendering techniques. In this dissertation, I follow the dual of the well-known computer vision approach: vision is inverse graphics: to improve graphical algorithms, I consider the operation of the human visual system. I propose to model and exploit the limitations of the visual system in the context of novel high-refresh-rate displays; specifically, I focus on spatio-temporal perception, a topic that has received remarkably less attention than spatial-only perception so far. I present three main contributions. First, I demonstrate the validity of the perceptual approach by presenting a conceptually simple rendering technique motivated by our eyes' limited sensitivity to high spatio-temporal change which reduces the rendering load and transmission requirement of current-generation VR headsets without introducing perceivable visual artefacts. Second, I present two visual models related to motion perception: (a) a metric for detecting flicker; and (b) a comprehensive visual model to predict perceived motion quality on monitors with arbitrary refresh rates and monitor resolutions. Third, I propose an adaptive rendering algorithm that utilises the proposed models. All algorithms operate on physical colorimetric units (instead of display-referenced pixel values), for which I provide the appropriate display measurements and models. All proposed algorithms and visual models are calibrated and validated with psychophysical experiments

Apollo (Cambridge)

Graph Edge Bundling by Medial Axes

Author: Ersoy Ozan
Telea Alexsandru
Publication venue
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

Graph Edge Bundling by Medial Axes

Author: Ersoy Ozan
Telea Alexsandru
Publication venue
Publication date: 01/01/2011
Field of study

We present a new method for bundling edges of general graphs, based on 2D medial axes of edge sets which are similar in terms of position. We combine edge clustering, distance fields, and 2D medial axes to progressively bundle general graphs by attract-ing edges towards the centerlines of level sets of their distance fields. Our method allows for an efficient GPU implementation. We illustrate our method on several large real-world graphs

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen