7 research outputs found

    GPU LSM: A Dynamic Dictionary Data Structure for the GPU

    Full text link
    We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS'18

    Scalable and Probabilistically Complete Planning for Robotic Spatial Extrusion

    Full text link
    There is increasing demand for automated systems that can fabricate 3D structures. Robotic spatial extrusion has become an attractive alternative to traditional layer-based 3D printing due to a manipulator's flexibility to print large, directionally-dependent structures. However, existing extrusion planning algorithms require a substantial amount of human input, do not scale to large instances, and lack theoretical guarantees. In this work, we present a rigorous formalization of robotic spatial extrusion planning and provide several efficient and probabilistically complete planning algorithms. The key planning challenge is, throughout the printing process, satisfying both stiffness constraints that limit the deformation of the structure and geometric constraints that ensure the robot does not collide with the structure. We show that, although these constraints often conflict with each other, a greedy backward state-space search guided by a stiffness-aware heuristic is able to successfully balance both constraints. We empirically compare our methods on a benchmark of over 40 simulated extrusion problems. Finally, we apply our approach to 3 real-world extrusion problems

    Doctor of Philosophy

    Get PDF
    dissertationReal-time global illumination is the next frontier in real-time rendering. In an attempt to generate realistic images, games have followed the film industry into physically based shading and will soon begin integrating global illumination techniques. Traditional methods require too much memory and too much time to compute for real-time use. With Modular and Delta Radiance Transfer we precompute a scene-independent, low-frequency basis that allows us to calculate complex indirect lighting calculations in a much lower dimensional subspace with a reduced memory footprint and real-time execution. The results are then applied as a light map on many different scenes. To improve the low frequency results, we also introduce a novel screen space ambient occlusion technique that allows us to generate a smoother result with fewer samples. These three techniques, low and high frequency used together, provide a viable indirect lighting solution that can be run in milliseconds on today's hardware, providing a useful new technique for indirect lighting in real-time graphics

    Hardware Accelerators for Animated Ray Tracing

    Get PDF
    Future graphics processors are likely to incorporate hardware accelerators for real-time ray tracing, in order to render increasingly complex lighting effects in interactive applications. However, ray tracing poses difficulties when drawing scenes with dynamic content, such as animated characters and objects. In dynamic scenes, the spatial datastructures used to accelerate ray tracing are invalidated on each animation frame, and need to be rapidly updated. Tree update is a complex subtask in its own right, and becomes highly expensive in complex scenes. Both ray tracing and tree update are highly memory-intensive tasks, and rendering systems are increasingly bandwidth-limited, so research on accelerator hardware has focused on architectural techniques to optimize away off-chip memory traffic. Dynamic scene support is further complicated by the recent introduction of compressed trees, which use low-precision numbers for storage and computation. Such compression reduces both the arithmetic and memory bandwidth cost of ray tracing, but adds to the complexity of tree update.This thesis proposes methods to cope with dynamic scenes in hardware-accelerated ray tracing, with focus on reducing traffic to external memory. Firstly, a hardware architecture is designed for linear bounding volume hierarchy construction, an algorithm which is a basic building block in most state-of-the-art software tree builders. The algorithm is rearranged into a streaming form which reduces traffic to one-third of software implementations of the same algorithm. Secondly, an algorithm is proposed for compressing bounding volume hierarchies in a streaming manner as they are output from a hardware builder, instead of performing compression as a postprocessing pass. As a result, with the proposed method, compression reduces the overall cost of tree update rather than increasing it. The last main contribution of this thesis is an evaluation of shallow bounding volume hierarchies, common in software ray tracing, for use in hardware pipelines. These are found to be more energy-efficient than binary hierarchies. The results in this thesis both conïŹrm that dynamic scene support may become a bottleneck in real time ray tracing, and add to the state of the art on tree update in terms of energy-efficiency, as well as the complexity of scenes that can be handled in real time on resource-constrained platforms

    Faster data structures and graphics hardware techniques for high performance rendering

    Get PDF
    Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isn’t a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time

    Accelerating and simulating detected physical interations

    Get PDF
    The aim of this doctoral thesis is to present a body of work aimed at improving performance and developing new methods for animating physical interactions using simulation in virtual environments. To this end we develop a number of novel parallel collision detection and fracture simulation algorithms. Methods for traversing and constructing bounding volume hierarchies (BVH) on graphics processing units (GPU) have had a wide success. In particular, they have been adopted widely in simulators, libraries and benchmarks as they allow applications to reach new heights in terms of performance. Even with such a development however, a thorough adoption of techniques has not occurred in commercial and practical applications. Due to this, parallel collision detection on GPUs remains a relatively niche problem and a wide number of applications could benefit from a significant boost in proclaimed performance gains. In fracture simulations, explicit surface tracking methods have a good track record of success. In particular they have been adopted thoroughly in 3D modelling and animation software like Houdini [124] as they allow accurate simulation of intricate fracture patterns with complex interactions, which are generated using physical laws. Even so, existing methods can pose restrictions on the geometries of simulated objects. Further, they often have tight dependencies on implicit surfaces (e.g. level sets) for representing cracks and performing cutting to produce rigid-body fragments. Due to these restrictions, catering to various geometries can be a challenge and the memory cost of using implicit surfaces can be detrimental and without guarantee on the preservation of sharp features. We present our work in four main chapters. We first tackle the problem in the accelerating collision detection on the GPU via BVH traversal - one of the most demanding components during collision detection. Secondly, we show the construction of a new representation of the BVH called the ostensibly implicit tree - a layout of nodes in memory which is encoded using the bitwise representation of the number of enclosed objects in the tree (e.g. polygons). Thirdly, we shift paradigm to the task of simulating breaking objects after collision: we show how traditional finite elements can be extended as a way to prevent frequent re-meshing during fracture evolution problems. Finally, we show how the fracture surface–represented as an explicit (e.g. triangulated) surface mesh–is used to generate rigid body fragments using a novel approach to mesh cutting