182 research outputs found
Auto-tuning Interactive Ray Tracing using an Analytical GPU Architecture Model
This paper presents a method for auto-tuning interactive ray tracing on GPUs using a hardware model. Getting full performance from modern GPUs is a challenging task. Workloads which require a guaranteed performance over several runs must select parameters for the worst performance of all runs. Our method uses an analyti- cal GPU performance model to predict the current frame’s render- ing time using a selected set of parameters. These parameters are then optimised for a selected frame rate performance on the partic- ular GPU architecture. We use auto-tuning to determine parameters such as phong shading, shadow rays and the number of ambient oc- clusion rays. We sample a priori information about the current ren- dering load to estimate the frame workload. A GPU model is run iteratively using this information to tune rendering parameters for a target frame rate. We use the OpenCL API allowing tuning across different GPU architectures. Our auto-tuning enables the render- ing of each frame to execute in a predicted time, so a target frame rate can be achieved even with widely varying scene complexities. Using this method we can select optimal parameters for the cur- rent execution taking into account the current viewpoint and scene, achieving performance improvements over predetermined parame- ters
Efficient Culling Techniques for Interactive Deformable NURBS Surfaces on GPU
[Abstrtact] InfoValue: NURBS (Non-uniform rational B-splines) surfaces are the standard freeform representation in Computer-Aided Design (CAD) applications. Rendering NURBS surfaces accurately while they are interactively manipulated and deformed is a challenging task. In order to achieve it, the elimination from pipeline in early stages of back-facing surfaces or surface pieces is a key advantage. Furthermore, an effective interactive manipulation implies that all the culling computations should be performed for each frame, facing the possibility of fast changes in occlusion information. In this paper, different interactive culling strategies for NURBS surfaces are presented and analyzed. These culling techniques are based on the exploitation of the geometric properties presented in a NURBS surface, that allow easily to find bounds for it in screen space for each frame. Furthermore, the culling overhead for our proposals is small compared to the computational saving, outperforming a proposal without culling. An implementation of these strategies using current GPUs is presented, achieving real-time and interactive rendering rates of complex parametric models.Xunta de Galicia y fondos FEDER; GRC2013/055Ministerio de EconomĂa y Competitividad y fondos FEDER; TIN2013-42148-
Explicit Cache Management for Volume Ray-Casting on Parallel Architectures
A major challenge when designing general purpose graphics hardware is to allow efficient access to texture data. Although different rendering paradigms vary with respect to their data access patterns, there is no flexibility when it comes to data caching provided by the graphics architecture. In this paper we focus on volume ray-casting, and show the benefits of algorithm-aware data caching. Our Marching Caches method exploits inter-ray coherence and thus utilizes the memory layout of the highly parallel processors by allowing them to share data through a cache which marches along with the ray front. By exploiting Marching Caches we can apply higher-order reconstruction and enhancement filters to generate more accurate and enriched renderings with an improved rendering performance. We have tested our Marching Caches with seven different filters, e. g., Catmul-Rom, B- spline, ambient occlusion projection, and could show that a speed up of four times can be achieved compared to using the caching implicitly provided by the graphics hardware, and that the memory bandwidth to global memory can be reduced by orders of magnitude. Throughout the paper, we will introduce the Marching Cache concept, provide implementation details and discuss the performance and memory bandwidth impact when using different filters
Photon Splatting Using a View-Sample Cluster Hierarchy
Splatting photons onto primary view samples, rather than gathering from a photon acceleration structure, can be a more efficient approach to evaluating the photon-density estimate in interactive applications, where the number of photons is often low compared to the number of view samples. Most photon splatting approaches struggle with large photon radii or high resolutions due to overdraw and insufficient culling. In this paper, we show how dynamic real-time diffuse interreflection can be achieved by using a full 3D acceleration structure built over the view samples and then splatting photons onto the view samples by traversing this data structure. Full dynamic lighting and scenes are possible by tracing and splatting photons, and rebuilding the acceleration structure every frame. We show that the number of view-sample/photon tests can be significantly reduced and suggest further culling techniques based on the normal cone of each node in the hierarchy. Finally, we present an approximate variant of our algorithm where photon traversal is stopped at a fixed level of our hierarchy, and the incoming radiance is accumulated per node and direction, rather than per view sample. This improves performance significantly with little visible degradation of quality
Power Efficiency for Software Algorithms running on Graphics Processors
Abstract in UndeterminedPower efficiency has become the most important consideration for many modern computing devices. In this paper, we examine power efficiency of a range of graphics algorithms on different GPUs. To measure power consumption, we have built a power measuring device that samples currents at a high frequency. Comparing power efficiency of different graphics algorithms is done by measuring power and performance of three different primary rendering algorithms and three different shadow algorithms. We measure these algorithms’ power signatures on a mobile phone, on an integrated CPU and graphics processor, and on high-end discrete GPUs, and then compare power efficiency across both algorithms and GPUs. Our results show that power efficiency is not always proportional to rendering performance and that, for some algorithms, power efficiency varies across different platforms. We also show that for some algorithms, energy efficiency is similar on all platforms
Decoupled Sampling for Real-Time Graphics Pipelines
We propose decoupled sampling, an approach that decouples shading from visibility sampling in order to enable motion blur and depth-of-field at reduced cost. More generally, it enables extensions of modern real-time graphics pipelines that provide controllable shading rates to trade off quality for performance. It can be thought of as a generalization of GPU-style multisample antialiasing (MSAA) to support unpredictable shading rates, with arbitrary mappings from visibility to shading samples as introduced by motion blur, depth-of-field, and adaptive shading. It is inspired by the Reyes architecture in offline rendering, but targets real-time pipelines by driving shading from visibility samples as in GPUs, and removes the need for micropolygon dicing or rasterization. Decoupled Sampling works by defining a many-to-one hash from visibility to shading samples, and using a buffer to memoize shading samples and exploit reuse across visibility samples. We present extensions of two modern GPU pipelines to support decoupled sampling: a GPU-style sort-last fragment architecture, and a Larrabee-style sort-middle pipeline. We study the architectural implications and derive end-to-end performance estimates on real applications through an instrumented functional simulator. We demonstrate high-quality motion blur and depth-of-field, as well as variable and adaptive shading rates
Decoupled Sampling for Graphics Pipelines
We propose a generalized approach to decoupling shading from visibility sampling in graphics pipelines, which we call decoupled sampling. Decoupled sampling enables stochastic supersampling of motion and defocus blur at reduced shading cost, as well as controllable or adaptive shading rates which trade off shading quality for performance. It can be thought of as a generalization of multisample antialiasing (MSAA) to support complex and dynamic mappings from visibility to shading samples, as introduced by motion and defocus blur and adaptive shading. It works by defining a many-to-one hash from visibility to shading samples, and using a buffer to memoize shading samples and exploit reuse across visibility samples. Decoupled sampling is inspired by the Reyes rendering architecture, but like traditional graphics pipelines, it shades fragments rather than micropolygon vertices, decoupling shading from the geometry sampling rate. Also unlike Reyes, decoupled sampling only shades fragments after precise computation of visibility, reducing overshading.
We present extensions of two modern graphics pipelines to support decoupled sampling: a GPU-style sort-last fragment architecture, and a Larrabee-style sort-middle pipeline. We study the architectural implications of decoupled sampling and blur, and derive end-to-end performance estimates on real applications through an instrumented functional simulator. We demonstrate high-quality motion and defocus blur, as well as variable and adaptive shading rates
Lunar Lander Offloading Operations Using a Heavy-Lift Lunar Surface Manipulator System
This study investigates the feasibility of using a heavy-lift variant of the Lunar Surface Manipulator System (LSMS-H) to lift and handle a 12 metric ton payload. Design challenges and requirements particular to handling heavy cargo were examined. Differences between the previously developed first-generation LSMS and the heavy-lift version are highlighted. An in-depth evaluation of the tip-over risk during LSMS-H operations has been conducted using the Synergistic Engineering Environment and potential methods to mitigate that risk are identified. The study investigated three specific offloading scenarios pertinent to current Lunar Campaign studies. The first involved offloading a large element, such as a habitat or logistics module, onto a mobility chassis with a lander-mounted LSMS-H and offloading that payload from the chassis onto the lunar surface with a surface-mounted LSMS-H. The second scenario involved offloading small pressurized rovers with a lander-mounted LSMS-H. The third scenario involved offloading cargo from a third-party lander, such as the proposed ESA cargo lander, with a chassis-mounted LSMS-H. In all cases, the analyses show that the LSMS-H can perform the required operations safely. However, Chariot-mounted operations require the addition of stabilizing outriggers, and when operating from the Lunar surface, LSMS-H functionality is enhanced by adding a simple ground anchoring system
Molecular Phylogeny of Edge Hill Virus Supports its Position in the Yellow Fever Virus Group and Identifies a New Genetic Variant
Edge Hill virus (EHV) is a mosquito-borne flavivirus isolated throughout Australia during mosquito surveillance programs. While not posing an immediate threat to the human population, EHV is a taxonomically interesting flavivirus since it remains the only member of the yellow fever virus (YFV) sub-group to be detected within Australia. Here we present both an antigenic and genetic investigation of collected isolates, and confirm taxonomic classification of the virus within the YFV-group. Isolates were not clustered based on geographical origin or time of isolation, suggesting that minimal genetic evolution of EHV has occurred over geographic distance or time within the EHV cluster. However, two isolates showed significant differences in antigenic reactivity patterns, and had a much larger divergence from the EHV prototype (19% nucleotide and 6% amino acid divergence), indicating a distinct subtype or variant within the EHV subgroup
- …