18 research outputs found
Faster data structures and graphics hardware techniques for high performance rendering
Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isnât a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time
Hardware acceleration of photon mapping
PhD ThesisThe quest for realism in computer-generated graphics has yielded a range of algorithmic
techniques, the most advanced of which are capable of rendering images at close to photorealistic
quality. Due to the realism available, it is now commonplace that computer graphics are used in
the creation of movie sequences, architectural renderings, medical imagery and product
visualisations.
This work concentrates on the photon mapping algorithm [1, 2], a physically based global
illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically
accurate images.
A drawback to photon mapping however is its rendering times, which can be significantly longer
than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is
associated with a high computational cost. This computation is usually performed using the
general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm
implemented as a software routine. Other options available for processing these algorithms
include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware
devices.
GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation,
however with their recent drive towards increased programmability they can also be used to
process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often
have to be reworked to make optimal use of the limited resources available.
There are very few custom hardware devices available for acceleration of the photon mapping
algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of
producing the same physical accuracy and therefore realism, there are similarities between the
algorithms. There have been several hardware prototypes, and at least one commercial offering,
created with the goal of accelerating ray-trace rendering [3]. However, properties making many of
these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping.
There are even fewer proposals for acceleration of the additional functions found only in photon
mapping.
All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently
difficult to scale, while many of the custom hardware devices available thus far make use of large
processing elements and complex acceleration data structures.
In this work we make use of three novel approaches in the design of highly scalable specialised
hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is
gained through:
⢠The use of a brute-force approach in place of the commonly used smart approach, thus
eliminating much data pre-processing, complex data structures and large processing units
often required.
⢠The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a
reduction in processing area requirement.
⢠A novel redesign of the photon inclusion test, used within the photon search method of
the photon mapping algorithm. This allows an intelligent memory structure to be used for
the search.
The design uses two hardware structures, both of which accelerate one core rendering function.
Renderings produced using field programmable gate array (FPGA) based prototypes are presented,
along with details of 90nm synthesised versions of the designs which show that close to an orderof-
magnitude speedup over a software implementation is possible. Due to the scalable nature of
the design, it is likely that any advantage can be maintained in the face of improving processor
speeds.
Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used
software acceleration method. This means that the device can interface almost directly to a frontend
modelling package, minimising much of the pre-processing required by most other proposals
Hardware Accelerators for Animated Ray Tracing
Future graphics processors are likely to incorporate hardware accelerators for real-time ray tracing, in order to render increasingly complex lighting effects in interactive applications. However, ray tracing poses difficulties when drawing scenes with dynamic content, such as animated characters and objects. In dynamic scenes, the spatial datastructures used to accelerate ray tracing are invalidated on each animation frame, and need to be rapidly updated. Tree update is a complex subtask in its own right, and becomes highly expensive in complex scenes. Both ray tracing and tree update are highly memory-intensive tasks, and rendering systems are increasingly bandwidth-limited, so research on accelerator hardware has focused on architectural techniques to optimize away off-chip memory traffic. Dynamic scene support is further complicated by the recent introduction of compressed trees, which use low-precision numbers for storage and computation. Such compression reduces both the arithmetic and memory bandwidth cost of ray tracing, but adds to the complexity of tree update.This thesis proposes methods to cope with dynamic scenes in hardware-accelerated ray tracing, with focus on reducing traffic to external memory. Firstly, a hardware architecture is designed for linear bounding volume hierarchy construction, an algorithm which is a basic building block in most state-of-the-art software tree builders. The algorithm is rearranged into a streaming form which reduces traffic to one-third of software implementations of the same algorithm. Secondly, an algorithm is proposed for compressing bounding volume hierarchies in a streaming manner as they are output from a hardware builder, instead of performing compression as a postprocessing pass. As a result, with the proposed method, compression reduces the overall cost of tree update rather than increasing it. The last main contribution of this thesis is an evaluation of shallow bounding volume hierarchies, common in software ray tracing, for use in hardware pipelines. These are found to be more energy-efficient than binary hierarchies. The results in this thesis both conďŹrm that dynamic scene support may become a bottleneck in real time ray tracing, and add to the state of the art on tree update in terms of energy-efficiency, as well as the complexity of scenes that can be handled in real time on resource-constrained platforms
Interactive global illumination on the CPU
Computing realistic physically-based global illumination in real-time remains one
of the major goals in the fields of rendering and visualisation; one that has not
yet been achieved due to its inherent computational complexity. This thesis focuses
on CPU-based interactive global illumination approaches with an aim to
develop generalisable hardware-agnostic algorithms. Interactive ray tracing is reliant
on spatial and cache coherency to achieve interactive rates which conflicts
with needs of global illumination solutions which require a large number of incoherent
secondary rays to be computed. Methods that reduce the total number of
rays that need to be processed, such as Selective rendering, were investigated to
determine how best they can be utilised.
The impact that selective rendering has on interactive ray tracing was analysed
and quantified and two novel global illumination algorithms were developed,
with the structured methodology used presented as a framework. Adaptive Inter-
leaved Sampling, is a generalisable approach that combines interleaved sampling
with an adaptive approach, which uses efficient component-specific adaptive guidance
methods to drive the computation. Results of up to 11 frames per second
were demonstrated for multiple components including participating media. Temporal Instant Caching, is a caching scheme for accelerating the computation of
diffuse interreflections to interactive rates. This approach achieved frame rates
exceeding 9 frames per second for the majority of scenes. Validation of the results
for both approaches showed little perceptual difference when comparing
against a gold-standard path-traced image. Further research into caching led to
the development of a new wait-free data access control mechanism for sharing the
irradiance cache among multiple rendering threads on a shared memory parallel
system. By not serialising accesses to the shared data structure the irradiance
values were shared among all the threads without any overhead or contention,
when reading and writing simultaneously. This new approach achieved efficiencies
between 77% and 92% for 8 threads when calculating static images and animations.
This work demonstrates that, due to the
flexibility of the CPU, CPU-based
algorithms remain a valid and competitive choice for achieving global illumination
interactively, and an alternative to the generally brute-force GPU-centric
algorithms
Hardware acceleration of photon mapping
The quest for realism in computer-generated graphics has yielded a range of algorithmic techniques, the most advanced of which are capable of rendering images at close to photorealistic quality. Due to the realism available, it is now commonplace that computer graphics are used in the creation of movie sequences, architectural renderings, medical imagery and product visualisations. This work concentrates on the photon mapping algorithm [1, 2], a physically based global illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically accurate images. A drawback to photon mapping however is its rendering times, which can be significantly longer than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is associated with a high computational cost. This computation is usually performed using the general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm implemented as a software routine. Other options available for processing these algorithms include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware devices. GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation, however with their recent drive towards increased programmability they can also be used to process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often have to be reworked to make optimal use of the limited resources available. There are very few custom hardware devices available for acceleration of the photon mapping algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of producing the same physical accuracy and therefore realism, there are similarities between the algorithms. There have been several hardware prototypes, and at least one commercial offering, created with the goal of accelerating ray-trace rendering [3]. However, properties making many of these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping. There are even fewer proposals for acceleration of the additional functions found only in photon mapping. All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently difficult to scale, while many of the custom hardware devices available thus far make use of large processing elements and complex acceleration data structures. In this work we make use of three novel approaches in the design of highly scalable specialised hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is gained through: ⢠The use of a brute-force approach in place of the commonly used smart approach, thus eliminating much data pre-processing, complex data structures and large processing units often required. ⢠The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a reduction in processing area requirement. ⢠A novel redesign of the photon inclusion test, used within the photon search method of the photon mapping algorithm. This allows an intelligent memory structure to be used for the search. The design uses two hardware structures, both of which accelerate one core rendering function. Renderings produced using field programmable gate array (FPGA) based prototypes are presented, along with details of 90nm synthesised versions of the designs which show that close to an orderof- magnitude speedup over a software implementation is possible. Due to the scalable nature of the design, it is likely that any advantage can be maintained in the face of improving processor speeds. Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used software acceleration method. This means that the device can interface almost directly to a frontend modelling package, minimising much of the pre-processing required by most other proposals.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Accelerating and simulating detected physical interations
The aim of this doctoral thesis is to present a body of work aimed at improving performance and developing new methods for animating physical interactions using simulation in virtual environments. To this end we develop a number of novel parallel collision detection and fracture simulation algorithms.
Methods for traversing and constructing bounding volume hierarchies (BVH) on graphics processing units (GPU) have had a wide success. In particular, they have been adopted widely in simulators, libraries and benchmarks as they allow applications to reach new heights in terms of performance. Even with such a development however, a thorough adoption of techniques has not occurred in commercial and practical applications. Due to this, parallel collision detection on GPUs remains a relatively niche problem and a wide number of applications could benefit from a significant boost in proclaimed performance gains.
In fracture simulations, explicit surface tracking methods have a good track record of success. In particular they have been adopted thoroughly in 3D modelling and animation software like Houdini [124] as they allow accurate simulation of intricate fracture patterns with complex interactions, which are generated using physical laws. Even so, existing methods can pose restrictions on the geometries of simulated objects. Further, they often have tight dependencies on implicit surfaces (e.g. level sets) for representing cracks and performing cutting to produce rigid-body fragments. Due to these restrictions, catering to various geometries can be a challenge and the memory cost of using implicit surfaces can be detrimental and without guarantee on the preservation of sharp features.
We present our work in four main chapters. We first tackle the problem in the accelerating collision detection on the GPU via BVH traversal - one of the most demanding components during collision detection. Secondly, we show the construction of a new representation of the BVH called the ostensibly implicit tree - a layout of nodes in memory which is encoded using the bitwise representation of the number of enclosed objects in the tree (e.g. polygons). Thirdly, we shift paradigm to the task of simulating breaking objects after collision: we show how traditional finite elements can be extended as a way to prevent frequent re-meshing during fracture evolution problems. Finally, we show how the fracture surfaceârepresented as an explicit (e.g. triangulated) surface meshâis used to generate rigid body fragments using a novel approach to mesh cutting
Investigating ray tracing algorithms and data structures in the context of visibility.
Ray tracing is a popular rendering method with built in visibility determination. However, the computational costs are significant. To reduce them, there has been extensive research leading to innovative data structures and algorithms that optimally utilize both object and image coherence. Investigating these from a visibility determination context without considering further optical effects is the main motivation of the research. Three methods - one structure and two coherent tree traversal algorithms - are discussed. While the structure aims to increase coherence, the algorithms aim to optimise utilization of coherence provided by ray tracing structures (kd-trees, octrees). RBSF trees - Restricted Binary Space Partitioning Trees - build upon the research in ray tracing with kd-trees. A higher degree of freedom for split plane selection increases object coherence implying a reduction in the number of node traversals and triangle intersections for most scenes. Consequently, reduced ray casting times for scenes with predominantly non-axis-aligned triangles is observed. Coherent Rendering is a rendering method that shows improved complexity, but at an absolute performance that is much slower than packet ray tracing. However, since it led to the creation of the Row Tracing' algorithm, it is described briefly. Row Tracing can be considered as an adaptation of Coherent Rendering, scanline rendering or packet ray tracing. One row of the image is considered and its pixels are determined. Similar to Coherent Rendering, an adapted version of Hierarchical Occlusion Maps is used to identify and skip occluded nodes. To maximize utilisation of coherence, the method is extended so that several adjacent rows are traversed through the tree. The two versions of Row Tracing demonstrate excellent performance, exceeding that of packet ray tracing. Further, it is shown that for larger models (2 million+ triangles). Row Tracing and Packet Row Tracing significantly outperform Z-buffer based methods (OpenGL). Row tracing show's scalability over scene sizes leading to a rendering method that has fast rendering times for both large and small models. In addition it has excellent parallelisation properties allowing utilisation of multiple cores with ease. Thus, the Row Tracing and Packet Row Tracing algorithms can be considered as the significant contributions of the Ph.D. These data structures and algorithms demonstrate that ray tracing data structures and adaptations of ray tracing algorithms exhibit excellent potential in a visibility context
Efficient From-Point Visibility for Global Illumination in Virtual Scenes with Participating Media
Sichtbarkeitsbestimmung ist einer der fundamentalen Bausteine fotorealistischer Bildsynthese. Da die Berechnung der Sichtbarkeit allerdings äuĂerst kostspielig zu berechnen ist, wird nahezu die gesamte Berechnungszeit darauf verwendet. In dieser Arbeit stellen wir neue Methoden zur Speicherung, Berechnung und Approximation von Sichtbarkeit in Szenen mit streuenden Medien vor, die die Berechnung erheblich beschleunigen, dabei trotzdem qualitativ hochwertige und artefaktfreie Ergebnisse liefern
Interactive molecular docking with haptics and advanced graphics
Biomolecular interactions underpin many of the processes that make up life. Molecular docking is the study of these interactions in silico. Interactive docking applications put the user in control of the docking process, allowing them to use their knowledge and intuition to determine how molecules bind together.
Interactive molecular docking applications often use haptic devices as a method of controlling the docking process. These devices allow the user to easily manipulate the structures in 3D space, whilst feeling the forces that occur in response to their manipulations. As a result of the force refresh rate requirements of haptic devices, haptic assisted docking applications are often limited, in that they model the interacting proteins as rigid, use low fidelity visualisations or require expensive propriety equipment to use.
The research in this thesis aims to address some of these limitations. Firstly, the development of a visualisation algorithm capable of rendering a depiction of a deforming protein at an interactive refresh rate, with per-pixel shadows and ambient occlusion, is discussed. Then, a novel approach to modelling molecular flexibility whilst maintaining a stable haptic refresh rate is developed.
Together these algorithms are presented within Haptimol FlexiDock, the first haptic-assisted molecular docking application to support receptor flexibility with high fidelity graphics, whilst also maintaining interactive refresh rates on both the haptic device and visual display. Using Haptimol FlexiDock, docking experiments were performed between two protein-ligand pairs: Maltodextrin Binding Protein and Maltose, and glutamine Binding Protein and Glucose. When the ligand was placed in its approximate binding site, the direction of over 80% of the intra-molecular movement aligned with that seen in the experimental structures.
Furthermore, over 50% of the expected backbone motion was present in the structures generated with FlexiDock. Calculating the deformation of a biomolecule in real time, whilst maintaining an interactive refresh rate on the haptic device (> 500Hz) is a breakthrough in the field of interactive molecular docking, as, previous approaches either model protein flexibility, but fail to achieve the required haptic refresh rate, or do not consider biomolecular flexibility at all