Search CORE

1,678 research outputs found

Hardware acceleration of photon mapping

Author: Hoggins Carl Andrew
Publication venue: Newcastle University
Publication date: 01/01/2011
Field of study

PhD ThesisThe quest for realism in computer-generated graphics has yielded a range of algorithmic techniques, the most advanced of which are capable of rendering images at close to photorealistic quality. Due to the realism available, it is now commonplace that computer graphics are used in the creation of movie sequences, architectural renderings, medical imagery and product visualisations. This work concentrates on the photon mapping algorithm [1, 2], a physically based global illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically accurate images. A drawback to photon mapping however is its rendering times, which can be significantly longer than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is associated with a high computational cost. This computation is usually performed using the general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm implemented as a software routine. Other options available for processing these algorithms include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware devices. GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation, however with their recent drive towards increased programmability they can also be used to process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often have to be reworked to make optimal use of the limited resources available. There are very few custom hardware devices available for acceleration of the photon mapping algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of producing the same physical accuracy and therefore realism, there are similarities between the algorithms. There have been several hardware prototypes, and at least one commercial offering, created with the goal of accelerating ray-trace rendering [3]. However, properties making many of these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping. There are even fewer proposals for acceleration of the additional functions found only in photon mapping. All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently difficult to scale, while many of the custom hardware devices available thus far make use of large processing elements and complex acceleration data structures. In this work we make use of three novel approaches in the design of highly scalable specialised hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is gained through: • The use of a brute-force approach in place of the commonly used smart approach, thus eliminating much data pre-processing, complex data structures and large processing units often required. • The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a reduction in processing area requirement. • A novel redesign of the photon inclusion test, used within the photon search method of the photon mapping algorithm. This allows an intelligent memory structure to be used for the search. The design uses two hardware structures, both of which accelerate one core rendering function. Renderings produced using field programmable gate array (FPGA) based prototypes are presented, along with details of 90nm synthesised versions of the designs which show that close to an orderof- magnitude speedup over a software implementation is possible. Due to the scalable nature of the design, it is likely that any advantage can be maintained in the face of improving processor speeds. Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used software acceleration method. This means that the device can interface almost directly to a frontend modelling package, minimising much of the pre-processing required by most other proposals

Newcastle University eTheses

Doctor of Philosophy

Author: Kensler Andrew E.
Publication venue: University of Utah
Publication date: 27/04/2011
Field of study

dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers

The University of Utah: J. Willard Marriott Digital Library

Hierarchical N-Body problem on graphics processor unit

Author: Siddiqui Mohammad Faisal
Publication venue: RIT Scholar Works
Publication date: 15/03/2006
Field of study

Galactic simulation is an important cosmological computation, and represents a classical N-body problem suitable for implementation on vector processors. Barnes-Hut algorithm is a hierarchical N-Body method used to simulate such galactic evolution systems. Stream processing architectures expose data locality and concurrency available in multimedia applications. On the other hand, there are numerous compute-intensive scientific or engineering applications that can potentially benefit from such computational and communication models. These applications are traditionally implemented on vector processors. Stream architecture based graphics processor units (GPUs) present a novel computational alternative for efficiently implementing such high-performance applications. Rendering on a stream architecture sustains high performance, while user-programmable modules allow implementing complex algorithms efficiently. GPUs have evolved over the years, from being fixed-function pipelines to user programmable processors. In this thesis, we focus on the implementation of Barnes-Hut algorithm on typical current-generation programmable GPUs. We exploit computation and communication requirements present in Barnes-Hut algorithm to expose their suitability for user-programmable GPUs. Our implementation of the Barnes-Hut algorithm is formulated as a fragment shader targeting the selected GPU. We discuss implementation details, design issues, results, and challenges encountered in programming the fragment shader

RIT Scholar Works

Estimating performance of an ray- tracing ASIC design

Author: Brunvand Erik L.
Woop Sven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Journal ArticleRecursive ray tracing is a powerful rendering technique used to compute realistic images by simulating the global light transport in a scene. Algorithmic improvements and FPGA-based hardware implementations of ray tracing have demonstrated realtime performance but hardware that achieves performance levels comparable to commodity rasterization graphics chips is still not available. This paper describes the architecture and ASIC implementations of the DRPU design (Dynamic Ray Processing Unit) that closes this performance gap. The DRPU supports fully programmable shading and most kinds of dynamic scenes and thus provides similar capabilities as current GPUs. It achieves high efficiency due to SIMD processing of floating point vectors, massive multithreading, synchronous execution of packets of threads, and careful management of caches for scene data. To support dynamic scenes B-KD trees are used as spatial index structures that are processed by a custom traversal and intersection unit and modified by an Update Processor on scene changes

The University of Utah: J. Willard Marriott Digital Library

Software-Based Ray Tracing for Mobile Devices

Author: Koskela Matias
Publication venue
Publication date: 12/08/2015
Field of study

Ray tracing is a way to produce realistic images of three dimensional virtual scenes. It scales more to the number of pixels in the image than to the amount of details in the scene. This makes it an interesting application for mobile systems, which in general have smaller screens. Modern high-performance ray tracing depends on special acceleration data structures such as bounding volume hierarchies. Compressing the size of the bounding volume hierarchy leads to smaller memory bandwidth usage. This should be especially beneficial for mobile systems, which in general have smaller memory bandwidth. Compression also reduces cache misses and memory usage. Unfortunately, compression reduces the quality of the data structure, leading the ray traversal into unnecessary computations. In addition, compression increases the amount of work which needs to be carried out in the performance critical inner loop. The previous work on bounding volume hierarchy compression concentrates on inferring some of the coordinates from other coordinates or using different integer precisions. This thesis concentrates on using half-precision floating-point numbers, which have potential due to their greater dynamic range. If the halfs are too inaccurate for use as plain world coordinates, they can be used with hierarchical encoding. This restores the quality of the data structure back to original, but it requires even more work in the inner loop. Halfs reduce the whole memory usage by 7% and cache misses by 16%. Furthermore, they reduce power usage by 1.7%. The halfs’ effect on the performance is heavily dependent on the targeted hardware’s support for them. If decompression of the halfs is too slow, they will have a negative impact. Compared to integers, halfs have better performance in the so-called teapot-in-a-stadium problem

Trepo - Institutional Repository of Tampere University

TUT DPub