11 research outputs found

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationRay tracing is becoming more widely adopted in offline rendering systems due to its natural support for high quality lighting. Since quality is also a concern in most real time systems, we believe ray tracing would be a welcome change in the real time world, but is avoided due to insufficient performance. Since power consumption is one of the primary factors limiting the increase of processor performance, it must be addressed as a foremost concern in any future ray tracing system designs. This will require cooperating advances in both algorithms and architecture. In this dissertation I study ray tracing system designs from a data movement perspective, targeting the various memory resources that are the primary consumer of power on a modern processor. The result is high performance, low energy ray tracing architectures

    Automatically Optimizing Tree Traversal Algorithms

    Get PDF
    Many domains in computer science, from data-mining to graphics to computational astrophysics, focus heavily on irregular applications. In contrast to regular applications, which operate over dense matrices and arrays, irregular programs manipulate and traverse complex data structures like trees and graphs. As irregular applications operate on ever larger datasets, their performance suffers from poor locality and parallelism. Programmers are burdened with the arduous task of manually tuning such applications for better performance. Generally applicable techniques to optimize irregular applications are highly desired, yet scarce. In this dissertation, we argue that, for an important subset of irregular programs which arises in many domains, namely, tree traversal algorithms like Barnes-Hut, nearest neighbor and ray tracing, there exist general techniques to enhance performance. We investigate two sources of performance improvement: locality enhancement and vectorization. Furthermore we demonstrate that these techniques can be automatically applied by an optimizing compiler, relieving programmers of manual, error-prone, application-specific effort. Achieving high performance in many applications requires achieving good locality of reference. We propose two novel transformations called point blocking and traversal splicing, inspired by the classic tiling loop transformation, and show that it can substantially enhance temporal locality in tree traversals. We then present a transformation framework called TreeSplicer, that automatically applies these transformations, and uses autotuning techniques to determine appropriate parameters for the transformations. For six benchmark algorithms, we show that a combination of point blocking and traversal splicing can deliver single-thread speedups of up to 8.71 (geometric mean: 2.48), just from better locality. Modern commodity processors support SIMD instructions, and using these instructions to process multiple traversals at once has the potential to provide substantial performance improvements. Unfortunately tree algorithms often feature highly diverging traversals which inhibit efficient SIMD utilization, to the point that other, less profitable sources of vectorization must be exploited instead. We propose a dynamic reordering of traversals based on previous behavior, based on the insight that traversals which have behaved similarly so far are likely to behave similarly in the future, and show that this reordering can dramatically improve the SIMD utilization of diverging traversals, close to ideal utilization. We present a transformation framework, SIMTree, which facilitates vectorization of tree algorithms, and demonstrate speedups of up to 6.59 (geometric mean: 2.78). Furthermore our techniques can effectively SIMDize algorithms that prior, manual vectorization attempts could not

    Accelerating and simulating detected physical interations

    Get PDF
    The aim of this doctoral thesis is to present a body of work aimed at improving performance and developing new methods for animating physical interactions using simulation in virtual environments. To this end we develop a number of novel parallel collision detection and fracture simulation algorithms. Methods for traversing and constructing bounding volume hierarchies (BVH) on graphics processing units (GPU) have had a wide success. In particular, they have been adopted widely in simulators, libraries and benchmarks as they allow applications to reach new heights in terms of performance. Even with such a development however, a thorough adoption of techniques has not occurred in commercial and practical applications. Due to this, parallel collision detection on GPUs remains a relatively niche problem and a wide number of applications could benefit from a significant boost in proclaimed performance gains. In fracture simulations, explicit surface tracking methods have a good track record of success. In particular they have been adopted thoroughly in 3D modelling and animation software like Houdini [124] as they allow accurate simulation of intricate fracture patterns with complex interactions, which are generated using physical laws. Even so, existing methods can pose restrictions on the geometries of simulated objects. Further, they often have tight dependencies on implicit surfaces (e.g. level sets) for representing cracks and performing cutting to produce rigid-body fragments. Due to these restrictions, catering to various geometries can be a challenge and the memory cost of using implicit surfaces can be detrimental and without guarantee on the preservation of sharp features. We present our work in four main chapters. We first tackle the problem in the accelerating collision detection on the GPU via BVH traversal - one of the most demanding components during collision detection. Secondly, we show the construction of a new representation of the BVH called the ostensibly implicit tree - a layout of nodes in memory which is encoded using the bitwise representation of the number of enclosed objects in the tree (e.g. polygons). Thirdly, we shift paradigm to the task of simulating breaking objects after collision: we show how traditional finite elements can be extended as a way to prevent frequent re-meshing during fracture evolution problems. Finally, we show how the fracture surface–represented as an explicit (e.g. triangulated) surface mesh–is used to generate rigid body fragments using a novel approach to mesh cutting

    Foundations and Methods for GPU based Image Synthesis

    Get PDF
    Effects such as global illumination, caustics, defocus and motion blur are an integral part of generating images that are perceived as realistic pictures and cannot be distinguished from photographs. In general, two different approaches exist to render images: ray tracing and rasterization. Ray tracing is a widely used technique for production quality rendering of images. The image quality and physical correctness are more important than the time needed for rendering. Generating these effects is a very compute and memory intensive process and can take minutes to hours for a single camera shot. Rasterization on the other hand is used to render images if real-time constraints have to be met (e.g. computer games). Often specialized algorithms are used to approximate these complex effects to achieve plausible results while sacrificing image quality for performance. This thesis is split into two parts. In the first part we look at algorithms and load-balancing schemes for general purpose computing on graphics processing units (GPUs). Most of the ray tracing related algorithms (e.g. KD-tree construction or bidirectional path tracing) have unpredictable memory requirements. Dynamic memory allocation on GPUs suffers from global synchronization required to keep the state of current allocations. We present a method to reduce this overhead on massively parallel hardware architectures. In particular, we merge small parallel allocation requests from different threads that can occur while exploiting SIMD style parallelism. We speed-up the dynamic allocation using a set of constraints that can be applied to a large class of parallel algorithms. To achieve the image quality needed for feature films GPU-cluster are often used to cope with the amount of computation needed. We present a framework that employs a dynamic load balancing approach and applies fair scheduling to minimize the average execution time of spawned computational tasks. The load balancing capabilities are shown by handling irregular workloads: a bidirectional path tracer allowing renderings of complex effects at near interactive frame rates. In the second part of the thesis we try to reduce the image quality gap between production and real-time rendering. Therefore, an adaptive acceleration structure for screen-space ray tracing is presented that represents the scene geometry by planar approximations. The benefit is a fast method to skip empty space and compute exact intersection points based on the planar approximation. This technique allows simulating complex phenomena including depth-of-field rendering and ray traced reflections at real-time frame rates. To handle motion blur in combination with transparent objects we present a unified rendering approach that decouples space and time sampling. Thereby, we can achieve interactive frame rates by reusing fragments during the sampling step. The scene geometry that is potentially visible at any point in time for the duration of a frame is rendered in a rasterization step and stored in temporally varying fragments. We perform spatial sampling to determine all temporally varying fragments that intersect with a specific viewing ray at any point in time. Viewing rays can be sampled according to the lens uv-sampling to incorporate depth-of-field. In a final temporal sampling step, we evaluate the pre-determined viewing ray/fragment intersections for one or multiple points in time. This allows incorporating standard shading effects including and resulting in a physically plausible motion and defocus blur for transparent and opaque objects

    Scalable exploration of 3D massive models

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Esta tese presenta unha serie técnicas escalables que avanzan o estado da arte da creación e exploración de grandes modelos tridimensionaies. No ámbito da xeración destes modelos, preséntanse métodos para mellorar a adquisición e procesado de escenas reais, grazas a unha implementación eficiente dun sistema out- of- core de xestión de nubes de puntos, e unha nova metodoloxía escalable de fusión de datos de xeometría e cor para adquisicións con oclusións. No ámbito da visualización de grandes conxuntos de datos, que é o núcleo principal desta tese, preséntanse dous novos métodos. O primeiro é unha técnica adaptabile out-of-core que aproveita o hardware de rasterización da GPU e as occlusion queries para crear lotes coherentes de traballo, que serán procesados por kernels de trazado de raios codificados en shaders, permitindo out-of-core ray-tracing con sombreado e iluminación global. O segundo é un método de compresión agresivo que aproveita a redundancia xeométrica que se adoita atopar en grandes modelos 3D para comprimir os datos de forma que caiban, nun formato totalmente renderizable, na memoria da GPU. O método está deseñado para representacións voxelizadas de escenas 3D, que son amplamente utilizadas para diversos cálculos como para acelerar as consultas de visibilidade na GPU. A compresión lógrase fusionando subárbores idénticas a través dunha transformación de similitude, e aproveitando a distribución non homoxénea de referencias a nodos compartidos para almacenar punteiros aos nodos fillo, e utilizando unha codificación de bits variable. A capacidade e o rendemento de todos os métodos avalíanse utilizando diversos casos de uso do mundo real de diversos ámbitos e sectores, incluídos o patrimonio cultural, a enxeñería e os videoxogos.[Resumen] En esta tesis se presentan una serie técnicas escalables que avanzan el estado del arte de la creación y exploración de grandes modelos tridimensionales. En el ámbito de la generación de estos modelos, se presentan métodos para mejorar la adquisición y procesado de escenas reales, gracias a una implementación eficiente de un sistema out-of-core de gestión de nubes de puntos, y una nueva metodología escalable de fusión de datos de geometría y color para adquisiciones con oclusiones. Para la visualización de grandes conjuntos de datos, que constituye el núcleo principal de esta tesis, se presentan dos nuevos métodos. El primero de ellos es una técnica adaptable out-of-core que aprovecha el hardware de rasterización de la GPU y las occlusion queries, para crear lotes coherentes de trabajo, que serán procesados por kernels de trazado de rayos codificados en shaders, permitiendo renders out-of-core avanzados con sombreado e iluminación global. El segundo es un método de compresión agresivo, que aprovecha la redundancia geométrica que se suele encontrar en grandes modelos 3D para comprimir los datos de forma que quepan, en un formato totalmente renderizable, en la memoria de la GPU. El método está diseñado para representaciones voxelizadas de escenas 3D, que son ampliamente utilizadas para diversos cálculos como la aceleración las consultas de visibilidad en la GPU o el trazado de sombras. La compresión se logra fusionando subárboles idénticos a través de una transformación de similitud, y aprovechando la distribución no homogénea de referencias a nodos compartidos para almacenar punteros a los nodos hijo, utilizando una codificación de bits variable. La capacidad y el rendimiento de todos los métodos se evalúan utilizando diversos casos de uso del mundo real de diversos ámbitos y sectores, incluidos el patrimonio cultural, la ingeniería y los videojuegos.[Abstract] This thesis introduces scalable techniques that advance the state-of-the-art in massive model creation and exploration. Concerning model creation, we present methods for improving reality-based scene acquisition and processing, introducing an efficient implementation of scalable out-of-core point clouds and a data-fusion approach for creating detailed colored models from cluttered scene acquisitions. The core of this thesis concerns enabling technology for the exploration of general large datasets. Two novel solutions are introduced. The first is an adaptive out-of-core technique exploiting the GPU rasterization pipeline and hardware occlusion queries in order to create coherent batches of work for localized shader-based ray tracing kernels, opening the door to out-of-core ray tracing with shadowing and global illumination. The second is an aggressive compression method that exploits redundancy in large models to compress data so that it fits, in fully renderable format, in GPU memory. The method is targeted to voxelized representations of 3D scenes, which are widely used to accelerate visibility queries on the GPU. Compression is achieved by merging subtrees that are identical through a similarity transform and by exploiting the skewed distribution of references to shared nodes to store child pointers using a variable bitrate encoding The capability and performance of all methods are evaluated on many very massive real-world scenes from several domains, including cultural heritage, engineering, and gaming

    Hardware Accelerators for Animated Ray Tracing

    Get PDF
    Future graphics processors are likely to incorporate hardware accelerators for real-time ray tracing, in order to render increasingly complex lighting effects in interactive applications. However, ray tracing poses difficulties when drawing scenes with dynamic content, such as animated characters and objects. In dynamic scenes, the spatial datastructures used to accelerate ray tracing are invalidated on each animation frame, and need to be rapidly updated. Tree update is a complex subtask in its own right, and becomes highly expensive in complex scenes. Both ray tracing and tree update are highly memory-intensive tasks, and rendering systems are increasingly bandwidth-limited, so research on accelerator hardware has focused on architectural techniques to optimize away off-chip memory traffic. Dynamic scene support is further complicated by the recent introduction of compressed trees, which use low-precision numbers for storage and computation. Such compression reduces both the arithmetic and memory bandwidth cost of ray tracing, but adds to the complexity of tree update.This thesis proposes methods to cope with dynamic scenes in hardware-accelerated ray tracing, with focus on reducing traffic to external memory. Firstly, a hardware architecture is designed for linear bounding volume hierarchy construction, an algorithm which is a basic building block in most state-of-the-art software tree builders. The algorithm is rearranged into a streaming form which reduces traffic to one-third of software implementations of the same algorithm. Secondly, an algorithm is proposed for compressing bounding volume hierarchies in a streaming manner as they are output from a hardware builder, instead of performing compression as a postprocessing pass. As a result, with the proposed method, compression reduces the overall cost of tree update rather than increasing it. The last main contribution of this thesis is an evaluation of shallow bounding volume hierarchies, common in software ray tracing, for use in hardware pipelines. These are found to be more energy-efficient than binary hierarchies. The results in this thesis both confirm that dynamic scene support may become a bottleneck in real time ray tracing, and add to the state of the art on tree update in terms of energy-efficiency, as well as the complexity of scenes that can be handled in real time on resource-constrained platforms

    Path manipulation strategies for rendering dynamic environments.

    Get PDF
    The current work introduces path manipulation as a tool that extends bidirectional path tracing to reuse paths in the temporal domain. Defined as an apparatus of sampling and reuse strategies, path manipulation reconstructs the subpaths that compose the light transport paths and addresses the restriction of static geometry commonly associated with Monte Carlo light transport simulations. By reconstructing and reusing subpaths, the path manipulation algorithm obviates the regeneration of the entire path collection, reduces the computational load of the original algorithm and supports scene dynamism. Bidirectional path tracing relies on local path sampling techniques to generate the paths of light in a synthetic environment. By using the information localized at path vertices, like the probability distribution, the sampling techniques construct paths progressively with distinct probability densities. Each probability density corresponds to a particular sampling technique, which accounts for specific illumination effects. Bidirectional path tracing uses multiple importance sampling to combine paths sampled with different techniques in low-variance estimators. The path sampling techniques and multiple importance sampling are the keys to the efficacy of bidirectional path tracing. However, the sampling techniques gained little attention beyond the generation and evaluation of paths. Bidirectional path tracing was designed for static scenes and thus it discards the generated paths immediately after the evaluation of their contributions. Limiting the lifespan of paths to a generation-evaluation cycle imposes a static use of paths and of sampling techniques. The path manipulation algorithm harnesses the potential of the sampling techniques to supplant the static manipulation of paths with a generation-evaluation-reuse cycle. An intra-subpath connectivity strategy was devised to reconnect the segregated chains of the subpaths invalidated by the scene alterations. Successful intra-subpath connections generate subpaths in multiple pieces by reusing subpath chains from prior frames. Subpaths are reconstructed generically, regardless of the subpath or scene dynamism type and without the need for predefined animation paths. The result is the extension of bidirectional path tracing to the temporal domain
    corecore