869 research outputs found

    Perceptually optimized real-time computer graphics

    Get PDF
    Perceptual optimization, the application of human visual perception models to remove imperceptible components in a graphics system, has been proven effective in achieving significant computational speedup. Previous implementations of this technique have focused on spatial level of detail reduction, which typically results in noticeable degradation of image quality. This thesis introduces refresh rate modulation (RRM), a novel perceptual optimization technique that produces better performance enhancement while more effectively preserving image quality and resolving static scene elements in full detail. In order to demonstrate the effectiveness of this technique, a graphics framework has been developed that interfaces with eye tracking hardware to take advantage of user fixation data in real-time. Central to the framework is a high-performance GPGPU ray-tracing engine written in OpenCL. RRM reduces the frequency with which pixels outside of the foveal region are updated by the ray-tracer. A persistent pixel buffer is maintained such that peripheral data from previous frames provides context for the foveal image in the current frame. Traditional optimization techniques have also been incorporated into the ray-tracer for improved performance. Applying the RRM technique to the ray-tracing engine results in a speedup of 2.27 (252 fps vs. 111 fps at 1080p) for the classic Whitted scene with reflection and transmission enabled. A speedup of 3.41 (140 fps vs. 41 fps at 1080p) is observed for a high-polygon scene that depicts the Stanford Bunny. A small pilot study indicates that RRM achieves these results with minimal impact to perceived image quality. A secondary investigation is conducted regarding the performance benefits of increasing physics engine error tolerance for bounding volume hierarchy based collision detection when the scene elements involved are in the user\u27s periphery. The open-source Bullet Physics Library was used to add accurate collision detection to the full resolution ray-tracing engine. For a scene with a static high-polygon model and 50 moving spheres, a speedup of 1.8 was observed for physics calculations. The development and integration of this subsystem demonstrates the extensibility of the graphics framework

    Reducing redundancy of real time computer graphics in mobile systems

    Get PDF
    The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version

    Parallel Programming Recipes

    Get PDF
    Parallel programming has become vital for the success of commercial applications since Moore’s Law will now be used to double the processors (or cores) per chip every technology generation. The performance of applications depends on how software executions can be mapped on the multi-core chip, and how efficiently they run the cores. Currently, the increase of parallelism in software development is necessary, not only for taking advantage of multi-core capability, but also for adapting and surviving in the new silicon implementation. This project will provide the performance characteristics of parallelism for some common algorithms or computations using different parallel languages. Based on concrete experiments, where each algorithm is implemented on different languages and the program’s performance is measured, the project provides the recipes for the problem computations. The following are the central problems and algorithms of the project: Arithmetic Algebra: Maclaurin Series Calculation for ex, Dot-Product of Two Vectors: each vector has size n; Sort Algorithms: Bubble sort, Odd-Event sort; Graphics: Graphics rendering. The languages are chosen based on commonality in the current market and ease of use; i.e., OpenMP, MPI, and OpenCL. The purpose of this study is to provide reader a broad knowledge about parallel programming, the comparisons, in terms of performance and implementation cost, across languages and application types. It is hoped to be very useful for programmers/computer-architects to decide which language to use for a certain applications/problems and cost estimations for the projects. Also, it is hoped that the project can be expanded in the future so that more languages/technologies as well as applications can be analyze

    Faster data structures and graphics hardware techniques for high performance rendering

    Get PDF
    Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isn’t a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time

    Polygon-based hidden surface elimination algorithms: serial and parallel

    Get PDF
    Chapter 1 introduces the need for rapid solutions of hidden surface elimination (HSE) problems in the interactive display of objects and scenes, as used in many application areas such as flight and driving simulators and CAD systems. It reviews the existing approaches to high-performance computer graphics and to parallel computing. It then introduces the central tenet of this thesis: that general purpose parallel computers may be usefully applied to the solution of HSE problems. Finally it introduces a set of metrics for describing sets of scene data, and applies them to the test scenes used in this thesis. Chapter 2 describes variants of several common image space hidden surface elimination algorithms, which solve the HSE problem for scenes described as collections of polygons. Implementations of these HSE algorithms on a traditional, serial, single microprocessor computer are introduced and theoretical estimates of their performance are derived. The algorithms are compared under identical conditions for various sets of test data. The results of this comparison are then placed in context with existing historical results. Chapter 3 examines the application of MIMD style parallelism to accelerate the solution of HSE problems. MIMD parallel implementations of the previously considered HSE algorithms are introduced. Their behaviour under various system configurations and for various data sets is investigated and compared with theoretical estimates. The theoretical estimates are found to match closely the experimental findings. Chapter 4 summarises the conclusions of this thesis, finding that HSE algorithms can be implemented to use an MIMD parallel computer effectively, and that of the HSE algorithms examined the z-buffer algorithm generally proves to be a good compromise solution

    Ambient occlusion and shadows for molecular graphics

    Get PDF
    Computer based visualisations of molecules have been produced as early as the 1950s to aid researchers in their understanding of biomolecular structures. An important consideration for Molecular Graphics software is the ability to visualise the 3D structure of the molecule in a clear manner. Recent advancements in computer graphics have led to improved rendering capabilities of the visualisation tools. The capabilities of current shading languages allow the inclusion of advanced graphic effects such as ambient occlusion and shadows that greatly improve the comprehension of the 3D shapes of the molecules. This thesis focuses on finding improved solutions to the real time rendering of Molecular Graphics on modern day computers. The methods of calculating ambient occlusion and both hard and soft shadows are examined and implemented to give the user a more complete experience when navigating large molecular structures

    Ray Tracing Gems

    Get PDF
    This book is a must-have for anyone serious about rendering in real time. With the announcement of new ray tracing APIs and hardware to support them, developers can easily create real-time applications with ray tracing as a core component. As ray tracing on the GPU becomes faster, it will play a more central role in real-time rendering. Ray Tracing Gems provides key building blocks for developers of games, architectural applications, visualizations, and more. Experts in rendering share their knowledge by explaining everything from nitty-gritty techniques that will improve any ray tracer to mastery of the new capabilities of current and future hardware. What you'll learn: The latest ray tracing techniques for developing real-time applications in multiple domains Guidance, advice, and best practices for rendering applications with Microsoft DirectX Raytracing (DXR) How to implement high-performance graphics for interactive visualizations, games, simulations, and more Who this book is for: Developers who are looking to leverage the latest APIs and GPU technology for real-time rendering and ray tracing Students looking to learn about best practices in these areas Enthusiasts who want to understand and experiment with their new GPU

    Visualization and inspection of the geometry of particle packings

    Get PDF
    Gegenstand dieser Dissertation ist die Entwicklung von effizienten Verfahren zur Visualisierung und Inspektion der Geometrie von Partikelmischungen. Um das Verhalten der Simulation für die Partikelmischung besser zu verstehen und zu überwachen, sollten nicht nur die Partikel selbst, sondern auch spezielle von den Partikeln gebildete Bereiche, die den Simulationsfortschritt und die räumliche Verteilung von Hotspots anzeigen können, visualisiert werden können. Dies sollte auch bei großen Packungen mit Millionen von Partikeln zumindest mit einer interaktiven Darstellungsgeschwindigkeit möglich sein. . Da die Simulation auf der Grafikkarte (GPU) durchgeführt wird, sollten die Visualisierungstechniken die Daten des GPU-Speichers vollständig nutzen. Um die Qualität von trockenen Partikelmischungen wie Beton zu verbessern, wurde der Korngrößenverteilung große Aufmerksamkeit gewidmet, die die Raumfüllungsrate hauptsächlich beeinflusst und daher zwei der wichtigsten Eigenschaften des Betons bestimmt: die strukturelle Robustheit und die Haltbarkeit. Anhand der Korngrößenverteilung kann die Raumfüllungsrate durch Computersimulationen bestimmt werden, die analytischen Ansätzen in der Praxis wegen der breiten Größenverteilung der Partikel oft überlegen sind. Eine der weit verbreiteten Simulationsmethoden ist das Collective Rearrangement, bei dem die Partikel zunächst an zufälligen Positionen innerhalb eines Behälters platziert werden. Später werden Überlappungen zwischen Partikeln aufgelöst, indem überlappende Partikel voneinander weggedrückt werden. Durch geschickte Anpassung der Behältergröße während der Simulation, kann die Collective Rearrangement-Methode am Ende eine ziemlich dichte Partikelpackung generieren. Es ist jedoch sehr schwierig, den gesamten Simulationsprozess ohne ein interaktives Visualisierungstool zu optimieren oder dort Fehler zu finden. Ausgehend von der etablierten rasterisierungsbasierten Methode zum Darstellen einer großen Menge von Kugeln, bietet diese Dissertation zunächst schnelle und pixelgenaue Methoden zur neuartigen Visualisierung der Überlappungen und Freiräume zwischen kugelförmigen Partikeln innerhalb eines Behälters.. Die auf Rasterisierung basierenden Verfahren funktionieren gut für kleinere Partikelpackungen bis ca. eine Million Kugeln. Bei größeren Packungen entstehen Probleme durch die lineare Laufzeit und den Speicherverbrauch. Zur Lösung dieses Problems werden neue Methoden mit Hilfe von Raytracing zusammen mit zwei neuen Arten von Bounding-Volume-Hierarchien (BVHs) bereitgestellt. Diese können den Raytracing-Prozess deutlich beschleunigen --- die erste kann die vorhandene Datenstruktur für die Simulation wiederverwenden und die zweite ist speichereffizienter. Beide BVHs nutzen die Idee des Loose Octree und sind die ersten ihrer Art, die die Größe von Primitiven für interaktives Raytracing mit häufig aktualisierten Beschleunigungsdatenstrukturen berücksichtigen. Darüber hinaus können die Visualisierungstechniken in dieser Dissertation auch angepasst werden, um Eigenschaften wie das Volumen bestimmter Bereiche zu berechnen. All diese Visualisierungstechniken werden dann auf den Fall nicht-sphärischer Partikel erweitert, bei denen ein nicht-sphärisches Partikel durch ein starres System von Kugeln angenähert wird, um die vorhandene kugelbasierte Simulation wiederverwenden zu können. Dazu wird auch eine neue GPU-basierte Methode zum effizienten Füllen eines nicht-kugelförmigen Partikels mit polydispersen überlappenden Kugeln vorgestellt, so dass ein Partikel mit weniger Kugeln gefüllt werden kann, ohne die Raumfüllungsrate zu beeinträchtigen. Dies erleichtert sowohl die Simulation als auch die Visualisierung. Basierend auf den Arbeiten in dieser Dissertation können ausgefeiltere Algorithmen entwickelt werden, um großskalige nicht-sphärische Partikelmischungen effizienter zu visualisieren. Weiterhin kann in Zukunft Hardware-Raytracing neuerer Grafikkarten anstelle des in dieser Dissertation eingesetzten Software-Raytracing verwendet werden. Die neuen Techniken können auch als Grundlage für die interaktive Visualisierung anderer partikelbasierter Simulationen verwendet werden, bei denen spezielle Bereiche wie Freiräume oder Überlappungen zwischen Partikeln relevant sind.The aim of this dissertation is to find efficient techniques for visualizing and inspecting the geometry of particle packings. Simulations of such packings are used e.g. in material sciences to predict properties of granular materials. To better understand and supervise the behavior of these simulations, not only the particles themselves but also special areas formed by the particles that can show the progress of the simulation and spatial distribution of hot spots, should be visualized. This should be possible with a frame rate that allows interaction even for large scale packings with millions of particles. Moreover, given the simulation is conducted in the GPU, the visualization techniques should take full use of the data in the GPU memory. To improve the performance of granular materials like concrete, considerable attention has been paid to the particle size distribution, which is the main determinant for the space filling rate and therefore affects two of the most important properties of the concrete: the structural robustness and the durability. Given the particle size distribution, the space filling rate can be determined by computer simulations, which are often superior to analytical approaches due to irregularities of particles and the wide range of size distribution in practice. One of the widely adopted simulation methods is the collective rearrangement, for which particles are first placed at random positions inside a container, later overlaps between particles will be resolved by letting overlapped particles push away from each other to fill empty space in the container. By cleverly adjusting the size of the container according to the process of the simulation, the collective rearrangement method could get a pretty dense particle packing in the end. However, it is very hard to fine-tune or debug the whole simulation process without an interactive visualization tool. Starting from the well-established rasterization-based method to render spheres, this dissertation first provides new fast and pixel-accurate methods to visualize the overlaps and free spaces between spherical particles inside a container. The rasterization-based techniques perform well for small scale particle packings but deteriorate for large scale packings due to the large memory requirements that are hard to be approximated correctly in advance. To address this problem, new methods based on ray tracing are provided along with two new kinds of bounding volume hierarchies (BVHs) to accelerate the ray tracing process --- the first one can reuse the existing data structure for simulation and the second one is more memory efficient. Both BVHs utilize the idea of loose octree and are the first of their kind to consider the size of primitives for interactive ray tracing with frequently updated acceleration structures. Moreover, the visualization techniques provided in this dissertation can also be adjusted to calculate properties such as volumes of the specific areas. All these visualization techniques are then extended to non-spherical particles, where a non-spherical particle is approximated by a rigid system of spheres to reuse the existing simulation. To this end a new GPU-based method is presented to fill a non-spherical particle with polydisperse possibly overlapping spheres efficiently, so that a particle can be filled with fewer spheres without sacrificing the space filling rate. This eases both simulation and visualization. Based on approaches presented in this dissertation, more sophisticated algorithms can be developed to visualize large scale non-spherical particle mixtures more efficiently. Besides, one can try to exploit the hardware ray tracing of more recent graphic cards instead of maintaining the software ray tracing as in this dissertation. The new techniques can also become the basis for interactively visualizing other particle-based simulations, where special areas such as free space or overlaps between particles are of interest

    Doctor of Philosophy

    Get PDF
    dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers
    corecore