7 research outputs found
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
Scalable and Probabilistically Complete Planning for Robotic Spatial Extrusion
There is increasing demand for automated systems that can fabricate 3D
structures. Robotic spatial extrusion has become an attractive alternative to
traditional layer-based 3D printing due to a manipulator's flexibility to print
large, directionally-dependent structures. However, existing extrusion planning
algorithms require a substantial amount of human input, do not scale to large
instances, and lack theoretical guarantees. In this work, we present a rigorous
formalization of robotic spatial extrusion planning and provide several
efficient and probabilistically complete planning algorithms. The key planning
challenge is, throughout the printing process, satisfying both stiffness
constraints that limit the deformation of the structure and geometric
constraints that ensure the robot does not collide with the structure. We show
that, although these constraints often conflict with each other, a greedy
backward state-space search guided by a stiffness-aware heuristic is able to
successfully balance both constraints. We empirically compare our methods on a
benchmark of over 40 simulated extrusion problems. Finally, we apply our
approach to 3 real-world extrusion problems
Doctor of Philosophy
dissertationReal-time global illumination is the next frontier in real-time rendering. In an attempt to generate realistic images, games have followed the film industry into physically based shading and will soon begin integrating global illumination techniques. Traditional methods require too much memory and too much time to compute for real-time use. With Modular and Delta Radiance Transfer we precompute a scene-independent, low-frequency basis that allows us to calculate complex indirect lighting calculations in a much lower dimensional subspace with a reduced memory footprint and real-time execution. The results are then applied as a light map on many different scenes. To improve the low frequency results, we also introduce a novel screen space ambient occlusion technique that allows us to generate a smoother result with fewer samples. These three techniques, low and high frequency used together, provide a viable indirect lighting solution that can be run in milliseconds on today's hardware, providing a useful new technique for indirect lighting in real-time graphics
Hardware Accelerators for Animated Ray Tracing
Future graphics processors are likely to incorporate hardware accelerators for real-time ray tracing, in order to render increasingly complex lighting effects in interactive applications. However, ray tracing poses difficulties when drawing scenes with dynamic content, such as animated characters and objects. In dynamic scenes, the spatial datastructures used to accelerate ray tracing are invalidated on each animation frame, and need to be rapidly updated. Tree update is a complex subtask in its own right, and becomes highly expensive in complex scenes. Both ray tracing and tree update are highly memory-intensive tasks, and rendering systems are increasingly bandwidth-limited, so research on accelerator hardware has focused on architectural techniques to optimize away off-chip memory traffic. Dynamic scene support is further complicated by the recent introduction of compressed trees, which use low-precision numbers for storage and computation. Such compression reduces both the arithmetic and memory bandwidth cost of ray tracing, but adds to the complexity of tree update.This thesis proposes methods to cope with dynamic scenes in hardware-accelerated ray tracing, with focus on reducing traffic to external memory. Firstly, a hardware architecture is designed for linear bounding volume hierarchy construction, an algorithm which is a basic building block in most state-of-the-art software tree builders. The algorithm is rearranged into a streaming form which reduces traffic to one-third of software implementations of the same algorithm. Secondly, an algorithm is proposed for compressing bounding volume hierarchies in a streaming manner as they are output from a hardware builder, instead of performing compression as a postprocessing pass. As a result, with the proposed method, compression reduces the overall cost of tree update rather than increasing it. The last main contribution of this thesis is an evaluation of shallow bounding volume hierarchies, common in software ray tracing, for use in hardware pipelines. These are found to be more energy-efficient than binary hierarchies. The results in this thesis both conïŹrm that dynamic scene support may become a bottleneck in real time ray tracing, and add to the state of the art on tree update in terms of energy-efficiency, as well as the complexity of scenes that can be handled in real time on resource-constrained platforms
Faster data structures and graphics hardware techniques for high performance rendering
Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isnât a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time
Accelerating and simulating detected physical interations
The aim of this doctoral thesis is to present a body of work aimed at improving performance and developing new methods for animating physical interactions using simulation in virtual environments. To this end we develop a number of novel parallel collision detection and fracture simulation algorithms.
Methods for traversing and constructing bounding volume hierarchies (BVH) on graphics processing units (GPU) have had a wide success. In particular, they have been adopted widely in simulators, libraries and benchmarks as they allow applications to reach new heights in terms of performance. Even with such a development however, a thorough adoption of techniques has not occurred in commercial and practical applications. Due to this, parallel collision detection on GPUs remains a relatively niche problem and a wide number of applications could benefit from a significant boost in proclaimed performance gains.
In fracture simulations, explicit surface tracking methods have a good track record of success. In particular they have been adopted thoroughly in 3D modelling and animation software like Houdini [124] as they allow accurate simulation of intricate fracture patterns with complex interactions, which are generated using physical laws. Even so, existing methods can pose restrictions on the geometries of simulated objects. Further, they often have tight dependencies on implicit surfaces (e.g. level sets) for representing cracks and performing cutting to produce rigid-body fragments. Due to these restrictions, catering to various geometries can be a challenge and the memory cost of using implicit surfaces can be detrimental and without guarantee on the preservation of sharp features.
We present our work in four main chapters. We first tackle the problem in the accelerating collision detection on the GPU via BVH traversal - one of the most demanding components during collision detection. Secondly, we show the construction of a new representation of the BVH called the ostensibly implicit tree - a layout of nodes in memory which is encoded using the bitwise representation of the number of enclosed objects in the tree (e.g. polygons). Thirdly, we shift paradigm to the task of simulating breaking objects after collision: we show how traditional finite elements can be extended as a way to prevent frequent re-meshing during fracture evolution problems. Finally, we show how the fracture surfaceârepresented as an explicit (e.g. triangulated) surface meshâis used to generate rigid body fragments using a novel approach to mesh cutting