60 research outputs found
Reducing redundancy of real time computer graphics in mobile systems
The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem.
First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average.
Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer tĂ©cnicas efectivas y originales para eliminar computaciones inĂştiles que aparecen en aplicaciones gráficas, con especial Ă©nfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energĂ©tica de los sistemas CPU/GPU no es solo clave para alargar la vida de la baterĂa, sino tambiĂ©n incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energĂa en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energĂa del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminaciĂłn de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de Ă©l no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una tĂ©cnica arquitectĂłnica original para GPUs mĂłviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y asĂ minimizar el overshading, y por lo tanto reducir el tiempo de ejecuciĂłn y el consumo de energĂa. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a travĂ©s de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mĂnimos en energĂa. Solo requiere añadir una pequeña unidad hardware para capturar la informaciĂłn de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducciĂłn de energĂa en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la DetecciĂłn de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos mĂłviles y la tendencia es hacia escenas más complejas y realistas con simulaciones fĂsicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de fĂsicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en tĂ©rminos de consumo energĂ©tico. Proponemos Render Based Collision Detection (RBCD), una tĂ©cnica energĂ©ticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecuciĂłn es reducido casi tres Ăłrdenes de magnitud (600x speedup), porque la mayorĂa de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es asĂ necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulaciĂłn de fĂsicas está en el camino crĂtico. Sin embargo, la ventaja más importante de nuestra tĂ©cnica es el enorme ahorro de energĂa que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyĂ©ndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energĂa consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version
Visibility rendering order: Improving energy efficiency on mobile GPUs through frame coherence
During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average over a state-of-the-art mobile GPU.Peer ReviewedPostprint (author's final draft
High-Level GPU Programming: Domain-Specific Optimization and Inference
When writing computer software one is often forced to balance the need for high run-time performance with high programmer productivity. By using a high-level language it is often possible to cut development times, but this typically comes at the cost of reduced run-time performance. Using a lower-level language, programs can be made very efficient but at the cost of increased development time. Real-time computer graphics is an area where there are very high demands on both performance and visual quality. Typically, large portions of such applications are written in lower-level languages and also rely on dedicated hardware, in the form of programmable graphics processing units (GPUs), for handling computationally demanding rendering algorithms. These GPUs are parallel stream processors, specialized towards computer graphics, that have computational performance more than a magnitude higher than corresponding CPUs. This has revolutionized computer graphics and also led to GPUs being used to solve more general numerical problems, such as fluid and physics simulation, protein folding, image processing, and databases. Unfortunately, the highly specialized nature of GPUs has also made them difficult to program. In this dissertation we show that GPUs can be programmed at a higher level, while maintaining performance, compared to current lower-level languages. By constructing a domain-specific language (DSL), which provides appropriate domain-specific abstractions and user-annotations, it is possible to write programs in a more abstract and modular manner. Using knowledge of the domain it is possible for the DSL compiler to generate very efficient code. We show that, by experiment, the performance of our DSLs is equal to that of GPU programs written by hand using current low-level languages. Also, control over the trade-offs between visual quality and performance is retained. In the papers included in this dissertation, we present domain-specific languages targeted at numerical processing and computer graphics, respectively. These DSL have been implemented as embedded languages in Python, a dynamic programming language that provide a rich set of high-level features. In this dissertation we show how these features can be used to facilitate the construction of embedded languages
Exploiting frame coherence in real-time rendering for energy-efficient GPUs
The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images.
The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications.
First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads.
Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%.
Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de cĂ lcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada mĂ©s realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils Ă©s essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria sĂłn els principals consumidors d’energia en cĂ rregues grĂ fiques, però molt d’aquest consum Ă©s malbaratat en cĂ lculs redundants, ja que les animacions produĂŻdes sÂżaconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi Ă©s millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els cĂ lculs i accessos redundants inherents a les aplicacions grĂ fiques. Primerament, ens centrem en reduir cĂ lculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que mĂ©s del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges sĂłn iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els cĂ lculs i accessos a memòria de la regiĂł. RE supera Ă mpliament propostes relacionades de la literatura, aconseguint una reducciĂł del consum energètic del 37% i del temps d’execuciĂł del 33%. Seguidament, ens centrem en reduir cĂ lculs redundants en fragments que eventualment no seran visibles. En aplicacions grĂ fiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serĂ visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informaciĂł obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible mĂ©s llunyĂ desprĂ©s de processar cada regiĂł de pantalla. Quan es processa una regiĂł a la imatge segĂĽent, es prediu que les primitives mĂ©s llunyanes a el punt guardat seran ocloses i es processen desprĂ©s de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicciĂł s’empra en millorar la detecciĂł de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mĂnim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execuciĂł del 39%. Finalment, ens centrem a reduir cĂ lculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada pĂxel i fent cĂ lculs a cada localitzaciĂł mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el cĂ lcul del mateix color en pĂxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la segĂĽent imatge..
Exploiting frame coherence in real-time rendering for energy-efficient GPUs
The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images.
The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications.
First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads.
Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%.
Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de cĂ lcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada mĂ©s realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils Ă©s essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria sĂłn els principals consumidors d’energia en cĂ rregues grĂ fiques, però molt d’aquest consum Ă©s malbaratat en cĂ lculs redundants, ja que les animacions produĂŻdes sÂżaconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi Ă©s millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els cĂ lculs i accessos redundants inherents a les aplicacions grĂ fiques. Primerament, ens centrem en reduir cĂ lculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que mĂ©s del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges sĂłn iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els cĂ lculs i accessos a memòria de la regiĂł. RE supera Ă mpliament propostes relacionades de la literatura, aconseguint una reducciĂł del consum energètic del 37% i del temps d’execuciĂł del 33%. Seguidament, ens centrem en reduir cĂ lculs redundants en fragments que eventualment no seran visibles. En aplicacions grĂ fiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serĂ visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informaciĂł obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible mĂ©s llunyĂ desprĂ©s de processar cada regiĂł de pantalla. Quan es processa una regiĂł a la imatge segĂĽent, es prediu que les primitives mĂ©s llunyanes a el punt guardat seran ocloses i es processen desprĂ©s de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicciĂł s’empra en millorar la detecciĂł de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mĂnim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execuciĂł del 39%. Finalment, ens centrem a reduir cĂ lculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada pĂxel i fent cĂ lculs a cada localitzaciĂł mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el cĂ lcul del mateix color en pĂxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la segĂĽent imatge...Postprint (published version
3D graphics platforms and tools for mobile applications
The objective of the thesis is the investigation of mobile 3D graphics platforms and tools. This is important topic since 3D graphics is increasingly used in mobile devices. In the thesis platforms and tools specific for 3D graphics are analysed. Platforms are device- and operating system independent foundations for dealing with 3D graphics. This includes platforms based on 3D graphic languages: Open GL, Open GL ES, Open CL, Web GL and Web CL. Based on the platforms, there are specific tools which facili-tate applications development. In the thesis the Maya software environment for the modelling and Unity 3D game engine for animation are described. Workflow for using these tools is demonstrated on an example of application relying on 3D graphics. This application is fitting of clothes for Web fashion shop. Buying clothes from Web shops is complicated since it is impossible to check how the cloth will actually fit. This problem can be attempted to solve by using 3D model of the user and cloth items. The difficulty is dynamic visualization of model and cloth interaction. Example of this process is shown in the thesis. The conclusion is that new mobile devices will soon have graphics capabilities approaching requirements for sophisticated 3D graphics allowing to develop new types of applications. More work in the mobile 3D graphics area is needed still for creating more real and less cartoon-like models
Efficient multi-bounce lightmap creation using GPU forward mapping
Computer graphics can nowadays produce images in realtime that are hard to distinguish from photos of a real scene. One of the most important aspects to achieve this is the interaction of light with materials in the virtual scene. The lighting computation can be separated in two different parts. The first part is concerned with the direct illumination that is applied to all surfaces lit by a light source; algorithms related to this have been greatly improved over the last decades and together with the improvements of the graphics hardware can now produce realistic effects. The second aspect is about the indirect illumination which describes the multiple reflections of light from each surface. In reality, light that hits a surface is never fully absorbed, but instead reflected back into the scene. And even this reflected light is then reflected again and again until its energy is depleted. These multiple reflections make indirect illumination very computationally expensive. The first problem regarding indirect illumination is therefore, how it can be simplified to compute it faster. Another question concerning indirect illumination is, where to compute it. It can either be computed in the fixed image that is created when rendering the scene or it can be stored in a light map. The drawback of the first approach is, that the results need to be recomputed for every frame in which the camera changed. The second approach, on the other hand, is already used for a long time. Once a static scene has been set up, the lighting situation is computed regardless of the time it takes and the result is then stored into a light map. This is a texture atlas for the scene in which each surface point in the virtual scene has exactly one surface point in the 2D texture atlas. When displaying the scene with this approach, the indirect illumination does not need to be recomputed, but is simply sampled from the light map. The main contribution of this thesis is the development of a technique that computes the indirect illumination solution for a scene at interactive rates and stores the result into a light atlas for visualizing it. To achieve this, we overcome two main obstacles. First, we need to be able to quickly project data from any given camera configuration into the parts of the texture that are currently used for visualizing the 3D scene. Since our approach for computing and storing indirect illumination requires a huge amount of these projections, it needs to be as fast as possible. Therefore, we introduce a technique that does this projection entirely on the graphics card with a single draw call. Second, the reflections of light into the scene need to be computed quickly. Therefore, we separate the computation into two steps, one that quickly approximates the spreading of the light into the scene and a second one that computes the visually smooth final result using the aforementioned projection technique. The final technique computes the indirect illumination at interactive rates even for big scenes. It is furthermore very flexible to let the user choose between high quality results or fast computations. This allows the method to be used for quickly editing the lighting situation with high speed previews and then computing the final result in perfect quality at still interactive rates. The technique introduced for projecting data into the texture atlas is in itself highly flexible and also allows for fast painting onto objects and projecting data onto it, considering all perspective distortions and self-occlusions
Accelerated Graphical User Interfaces
Tato práce je zaměřena na multiplatformnĂ grafická uĹľivatelskĂ© rozhranĂ a jejich hardwarovou akceleraci. Popisuje, co to uĹľivatelskĂ© rozhranĂ jsou a srovnává nástroje na jejich tvorbu  a zpĹŻsoby jejich realizace. HlavnĂm bodem je vlastnĂ návrh a implementace nástroje na tvorbu multiplatformnĂch hardwarovÄ› akcelerovanĂ˝ch grafickĂ˝ch uĹľivatelskĂ˝ch rozhranĂ. Srovnává vlastnĂ koncept s existujĂcĂmi Ĺ™ešenĂmi, a uvádĂ ho do praxe na projektu s externĂ firmou.This thesis is focused on a multi-platform graphical user interface and its hardware acceleration. It describes what the user interfaces are, it compares the tools used for their creation, and the methods of their realization. The main focus is a custom design and implementation of tools used for creating a cross-platform hardware accelerated graphical user interface. It compares my own concept with existing solutions, and places it into practice on a project with an external company.
Doctor of Philosophy
dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers
- …