2,855 research outputs found
Exploiting frame coherence in real-time rendering for energy-efficient GPUs
The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images.
The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications.
First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads.
Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%.
Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de cĂ lcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada mĂ©s realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils Ă©s essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria sĂłn els principals consumidors d’energia en cĂ rregues grĂ fiques, però molt d’aquest consum Ă©s malbaratat en cĂ lculs redundants, ja que les animacions produĂŻdes sÂżaconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi Ă©s millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els cĂ lculs i accessos redundants inherents a les aplicacions grĂ fiques. Primerament, ens centrem en reduir cĂ lculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que mĂ©s del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges sĂłn iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els cĂ lculs i accessos a memòria de la regiĂł. RE supera Ă mpliament propostes relacionades de la literatura, aconseguint una reducciĂł del consum energètic del 37% i del temps d’execuciĂł del 33%. Seguidament, ens centrem en reduir cĂ lculs redundants en fragments que eventualment no seran visibles. En aplicacions grĂ fiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serĂ visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informaciĂł obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible mĂ©s llunyĂ desprĂ©s de processar cada regiĂł de pantalla. Quan es processa una regiĂł a la imatge segĂĽent, es prediu que les primitives mĂ©s llunyanes a el punt guardat seran ocloses i es processen desprĂ©s de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicciĂł s’empra en millorar la detecciĂł de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mĂnim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execuciĂł del 39%. Finalment, ens centrem a reduir cĂ lculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada pĂxel i fent cĂ lculs a cada localitzaciĂł mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el cĂ lcul del mateix color en pĂxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la segĂĽent imatge...Postprint (published version
Exploiting frame coherence in real-time rendering for energy-efficient GPUs
The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images.
The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications.
First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads.
Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%.
Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de cĂ lcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada mĂ©s realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils Ă©s essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria sĂłn els principals consumidors d’energia en cĂ rregues grĂ fiques, però molt d’aquest consum Ă©s malbaratat en cĂ lculs redundants, ja que les animacions produĂŻdes sÂżaconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi Ă©s millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els cĂ lculs i accessos redundants inherents a les aplicacions grĂ fiques. Primerament, ens centrem en reduir cĂ lculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que mĂ©s del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges sĂłn iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els cĂ lculs i accessos a memòria de la regiĂł. RE supera Ă mpliament propostes relacionades de la literatura, aconseguint una reducciĂł del consum energètic del 37% i del temps d’execuciĂł del 33%. Seguidament, ens centrem en reduir cĂ lculs redundants en fragments que eventualment no seran visibles. En aplicacions grĂ fiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serĂ visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informaciĂł obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible mĂ©s llunyĂ desprĂ©s de processar cada regiĂł de pantalla. Quan es processa una regiĂł a la imatge segĂĽent, es prediu que les primitives mĂ©s llunyanes a el punt guardat seran ocloses i es processen desprĂ©s de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicciĂł s’empra en millorar la detecciĂł de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mĂnim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execuciĂł del 39%. Finalment, ens centrem a reduir cĂ lculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada pĂxel i fent cĂ lculs a cada localitzaciĂł mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el cĂ lcul del mateix color en pĂxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la segĂĽent imatge..
Synchronized-tracing of implicit surfaces
Implicit surfaces are known for their ability to represent smooth objects of
arbitrary topology thanks to hierarchical combinations of primitives using a
structure called a blobtree. We present a new tile-based rendering pipeline
well suited for modeling scenarios, i.e., no preprocessing is required when
primitive parameters are updated. When using approximate signed distance
fields, we rely on compact, smooth CSG operators - extended from standard
bounded operators - to compute a tight volume of interest for all primitives of
the blobtree. The pipeline relies on a low-resolution A-buffer storing the
primitives of interest of a given screen tile. The A-buffer is then used during
ray processing to synchronize threads within a subfrustum. This allows coherent
field evaluation within workgroups. We use a sparse bottom-up tree traversal to
prune the blobtree on-the-fly which allows us to decorrelate field evaluation
complexity from the full blobtree size. The ray processing itself is done using
the sphere-tracing algorithm. The pipeline scales well to surfaces consisting
of thousands of primitives
Visibility rendering order: Improving energy efficiency on mobile GPUs through frame coherence
During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average over a state-of-the-art mobile GPU.Peer ReviewedPostprint (author's final draft
TCOR: a tile cache with optimal replacement
Cache Replacement Policies are known to have an important impact on hit rates. The OPT replacement policy [27] has been formally proven as optimal for minimizing misses. Due to its need to look far ahead for future memory accesses, it is often reduced to a yardstick for measuring the efficacy of other practical caches. In this paper, we bring the OPT to life, in architectures for mobile GPUs, for which energy efficiency is of great consequence. We also mold other factors in the memory hierarchy to enhance its impact. The end results are a 13.8% decrease in the memory hierarchy energy consumption and an increased throughput in the Tiling Engine. We also observe a 5.5% decrease in the total GPU energy and a 3.7% increase in frames per second (FPS).This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and the AGAUR grant 2020-FISDU-00287. We would also like to thank the anonymous reviewers for their valuable comments.Peer ReviewedPostprint (author's final draft
The Hierarchical Ray Engine
Due to the success of texture based approaches, ray casting has lately been confined to performing
preprocessing in realtime applications. Though GPU based ray casting implementations outperform the CPU now, they either do not scale well for higher primitive counts, or require the costly
construction of spatial hierarchies. We present an improved algorithm based on the Ray Engine
approach, which builds a hierarchy of rays instead of objects, completely on the graphics card.
Exploiting the coherence between rays when displaying refractive objects or computing caustics,
realtime frame rates are achieved without preprocessing. Thus, the method fills a gap in the
realtime rendering repertoire
The Hierarchical Ray Engine
Due to the success of texture based approaches, ray casting has lately been confined to performing
preprocessing in realtime applications. Though GPU based ray casting implementations outperform the CPU now, they either do not scale well for higher primitive counts, or require the costly
construction of spatial hierarchies. We present an improved algorithm based on the Ray Engine
approach, which builds a hierarchy of rays instead of objects, completely on the graphics card.
Exploiting the coherence between rays when displaying refractive objects or computing caustics,
realtime frame rates are achieved without preprocessing. Thus, the method fills a gap in the
realtime rendering repertoire
Content addressable memory project
A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks
Triangle Dropping: An occluded-geometry predictor for energy-efficient mobile GPUs
This article proposes a novel micro-architecture approach for mobile GPUs aimed at early removing the occluded geometry in a scene by leveraging frame-to-frame coherence, thus reducing the overall energy consumption. Mobile GPUs commonly implement a Tile-Based Rendering (TBR) architecture that differentiates two main phases: the Geometry Pipeline, where all the geometry of a scene is processed; and the Raster Pipeline, where primitives are rendered in a framebuffer. After the Geometry Pipeline, only non-culled primitives inside the camera’s frustum are stored into the Parameter Buffer, a data structure stored in DRAM. However, among the non-culled primitives there is a significant amount that are rendered but non-visible at all, resulting in useless computations. On average, 60% of those primitives are completely occluded in our benchmarks. Despite TBR architectures use on-chip caches for the Parameter Buffer, about 46% of the DRAM traffic still comes from accesses to such buffer. The proposed Triangle Dropping technique leverages the visibility information computed along the Raster Pipeline to predict the primitives’ visibility in the next frame to early discard those that will be totally occluded, drastically reducing Parameter Buffer accesses. On average, our approach achieves overall 14.5% energy savings, 28.2% energy-delay product savings, and a speedup of 20.2%.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant no. 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00 (AEI/FEDER, EU), and the ICREA Academia program. D. Corbalán-Navarro has been also supported by a PhD research fellowship from the University of Murcia’s “Plan Propio de Investigación.Peer ReviewedPostprint (author's final draft
Reducing redundancy of real time computer graphics in mobile systems
The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem.
First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average.
Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer tĂ©cnicas efectivas y originales para eliminar computaciones inĂştiles que aparecen en aplicaciones gráficas, con especial Ă©nfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energĂ©tica de los sistemas CPU/GPU no es solo clave para alargar la vida de la baterĂa, sino tambiĂ©n incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energĂa en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energĂa del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminaciĂłn de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de Ă©l no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una tĂ©cnica arquitectĂłnica original para GPUs mĂłviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y asĂ minimizar el overshading, y por lo tanto reducir el tiempo de ejecuciĂłn y el consumo de energĂa. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a travĂ©s de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mĂnimos en energĂa. Solo requiere añadir una pequeña unidad hardware para capturar la informaciĂłn de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducciĂłn de energĂa en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la DetecciĂłn de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos mĂłviles y la tendencia es hacia escenas más complejas y realistas con simulaciones fĂsicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de fĂsicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en tĂ©rminos de consumo energĂ©tico. Proponemos Render Based Collision Detection (RBCD), una tĂ©cnica energĂ©ticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecuciĂłn es reducido casi tres Ăłrdenes de magnitud (600x speedup), porque la mayorĂa de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es asĂ necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulaciĂłn de fĂsicas está en el camino crĂtico. Sin embargo, la ventaja más importante de nuestra tĂ©cnica es el enorme ahorro de energĂa que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyĂ©ndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energĂa consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version
- …