1,858 research outputs found
Fast Reliable Ray-tracing of Procedurally Defined Implicit Surfaces Using Revised Affine Arithmetic
Fast and reliable rendering of implicit surfaces is an important area in the field of implicit modelling. Direct rendering, namely ray-tracing, is shown to be a suitable technique for obtaining good-quality visualisations of implicit surfaces. We present a technique for reliable ray-tracing of arbitrary procedurally defined implicit surfaces by using a modification of Affine Arithmetic called Revised Affine Arithmetic. A wide range of procedurally defined implicit objects can be rendered using this technique including polynomial surfaces, constructive solids, pseudo-random objects, procedurally defined microstructures, and others. We compare our technique with other reliable techniques based on Interval and Affine Arithmetic to show that our technique provides the fastest, while still reliable, ray-surface intersections and ray-tracing. We also suggest possible modifications for the GPU implementation of this technique for real-time rendering of relatively simple implicit models and for near real-time for complex implicit models
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA
We present the results of gravitational direct -body simulations using the
Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed
for gaming computers. The force evaluation of the -body problem is
implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to
speed-up the calculations. We tested the implementation on three different
-body codes: two direct -body integration codes, using the 4th order
predictor-corrector Hermite integrator with block time-steps, and one
Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The
integration of the equations of motions for all codes is performed on the host
CPU.
We find that for particles the GPU outperforms the GRAPE-6Af, if
some softening in the force calculation is accepted. Without softening and for
very small integration time steps the GRAPE still outperforms the GPU. We
conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special
purpose hardware. Using the same time-step criterion, the total energy of the
-body system was conserved better than to one in on the GPU, only
about an order of magnitude worse than obtained with GRAPE-6Af. For N \apgt
10^5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at
about the same speed as the GRAPE-6Af.Comment: Accepted for publication in New Astronom
Explicit Cache Management for Volume Ray-Casting on Parallel Architectures
A major challenge when designing general purpose graphics hardware is to allow efficient access to texture data. Although different rendering paradigms vary with respect to their data access patterns, there is no flexibility when it comes to data caching provided by the graphics architecture. In this paper we focus on volume ray-casting, and show the benefits of algorithm-aware data caching. Our Marching Caches method exploits inter-ray coherence and thus utilizes the memory layout of the highly parallel processors by allowing them to share data through a cache which marches along with the ray front. By exploiting Marching Caches we can apply higher-order reconstruction and enhancement filters to generate more accurate and enriched renderings with an improved rendering performance. We have tested our Marching Caches with seven different filters, e. g., Catmul-Rom, B- spline, ambient occlusion projection, and could show that a speed up of four times can be achieved compared to using the caching implicitly provided by the graphics hardware, and that the memory bandwidth to global memory can be reduced by orders of magnitude. Throughout the paper, we will introduce the Marching Cache concept, provide implementation details and discuss the performance and memory bandwidth impact when using different filters
Heterogeneous Acceleration for 5G New Radio Channel Modelling Using FPGAs and GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
Faster Ray Tracing through Hierarchy Cut Code
We propose a novel ray reordering technique to accelerate the ray tracing
process by encoding and sorting rays prior to traversal. Instead of spatial
coordinates, our method encodes rays according to the cuts of the hierarchical
acceleration structure, which is called the hierarchy cut code. This approach
can better adapt to the acceleration structure and obtain a more reliable
encoding result. We also propose a compression scheme to decrease the sorting
overhead by a shorter sorting key. In addition, based on the phenomenon of
boundary drift, we theoretically explain the reason why existing reordering
methods cannot achieve better performance by using longer sorting keys. The
experiment demonstrates that our method can accelerate secondary ray tracing by
up to 1.81 times, outperforming the existing methods. Such result proves the
effectiveness of hierarchy cut code, and indicate that the reordering technique
can achieve greater performance improvement, which worth further research
The use of primitives in the calculation of radiative view factors
Compilations of radiative view factors (often in closed analytical form) are readily available in the open literature for commonly encountered geometries. For more complex three-dimensional (3D) scenarios, however, the effort required to solve the requisite multi-dimensional integrations needed to estimate a required view factor can be daunting to say the least. In such cases, a combination of finite element methods (where the geometry in question is sub-divided into a large number of uniform, often triangular, elements) and Monte Carlo Ray Tracing (MC-RT) has been developed, although frequently the software implementation is suitable only for a limited set of geometrical scenarios. Driven initially by a need to calculate the radiative heat transfer occurring within an operational fibre-drawing furnace, this research set out to examine options whereby MC-RT could be used to cost-effectively calculate any generic 3D radiative view factor using current vectorisation technologies
DISPATCH: A Numerical Simulation Framework for the Exa-scale Era. I. Fundamentals
We introduce a high-performance simulation framework that permits the
semi-independent, task-based solution of sets of partial differential
equations, typically manifesting as updates to a collection of `patches' in
space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks
are controlled by a rank-local `dispatcher' which selects, from a set of tasks
generally much larger than the number of physical cores (or hardware threads),
tasks that are ready for updating. The definition of a task can vary, for
example, with some solving the equations of ideal magnetohydrodynamics (MHD),
others non-ideal MHD, radiative transfer, or particle motion, and yet others
applying particle-in-cell (PIC) methods. Tasks do not have to be grid-based,
while tasks that are, may use either Cartesian or orthogonal curvilinear
meshes. Patches may be stationary or moving. Mesh refinement can be static or
dynamic. A feature of decisive importance for the overall performance of the
framework is that time steps are determined and applied locally; this allows
potentially large reductions in the total number of updates required in cases
when the signal speed varies greatly across the computational domain, and
therefore a corresponding reduction in computing time. Another feature is a
load balancing algorithm that operates `locally' and aims to simultaneously
minimise load and communication imbalance. The framework generally relies on
already existing solvers, whose performance is augmented when run under the
framework, due to more efficient cache usage, vectorisation, local
time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI
scaling.Comment: 17 pages, 8 figures. Accepted by MNRA
Doctor of Philosophy
dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers
Doctor of Philosophy in Computer Science
dissertationRay tracing is becoming more widely adopted in offline rendering systems due to its natural support for high quality lighting. Since quality is also a concern in most real time systems, we believe ray tracing would be a welcome change in the real time world, but is avoided due to insufficient performance. Since power consumption is one of the primary factors limiting the increase of processor performance, it must be addressed as a foremost concern in any future ray tracing system designs. This will require cooperating advances in both algorithms and architecture. In this dissertation I study ray tracing system designs from a data movement perspective, targeting the various memory resources that are the primary consumer of power on a modern processor. The result is high performance, low energy ray tracing architectures
- …