1,615 research outputs found
Exploiting frame coherence in real-time rendering for energy-efficient GPUs
The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images.
The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications.
First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads.
Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%.
Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de cĂ lcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada mĂ©s realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils Ă©s essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria sĂłn els principals consumidors d’energia en cĂ rregues grĂ fiques, però molt d’aquest consum Ă©s malbaratat en cĂ lculs redundants, ja que les animacions produĂŻdes sÂżaconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi Ă©s millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els cĂ lculs i accessos redundants inherents a les aplicacions grĂ fiques. Primerament, ens centrem en reduir cĂ lculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que mĂ©s del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges sĂłn iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els cĂ lculs i accessos a memòria de la regiĂł. RE supera Ă mpliament propostes relacionades de la literatura, aconseguint una reducciĂł del consum energètic del 37% i del temps d’execuciĂł del 33%. Seguidament, ens centrem en reduir cĂ lculs redundants en fragments que eventualment no seran visibles. En aplicacions grĂ fiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serĂ visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informaciĂł obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible mĂ©s llunyĂ desprĂ©s de processar cada regiĂł de pantalla. Quan es processa una regiĂł a la imatge segĂĽent, es prediu que les primitives mĂ©s llunyanes a el punt guardat seran ocloses i es processen desprĂ©s de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicciĂł s’empra en millorar la detecciĂł de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mĂnim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execuciĂł del 39%. Finalment, ens centrem a reduir cĂ lculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada pĂxel i fent cĂ lculs a cada localitzaciĂł mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el cĂ lcul del mateix color en pĂxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la segĂĽent imatge..
Exploiting frame coherence in real-time rendering for energy-efficient GPUs
The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images.
The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications.
First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads.
Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%.
Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de cĂ lcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada mĂ©s realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils Ă©s essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria sĂłn els principals consumidors d’energia en cĂ rregues grĂ fiques, però molt d’aquest consum Ă©s malbaratat en cĂ lculs redundants, ja que les animacions produĂŻdes sÂżaconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi Ă©s millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els cĂ lculs i accessos redundants inherents a les aplicacions grĂ fiques. Primerament, ens centrem en reduir cĂ lculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que mĂ©s del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges sĂłn iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els cĂ lculs i accessos a memòria de la regiĂł. RE supera Ă mpliament propostes relacionades de la literatura, aconseguint una reducciĂł del consum energètic del 37% i del temps d’execuciĂł del 33%. Seguidament, ens centrem en reduir cĂ lculs redundants en fragments que eventualment no seran visibles. En aplicacions grĂ fiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serĂ visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informaciĂł obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible mĂ©s llunyĂ desprĂ©s de processar cada regiĂł de pantalla. Quan es processa una regiĂł a la imatge segĂĽent, es prediu que les primitives mĂ©s llunyanes a el punt guardat seran ocloses i es processen desprĂ©s de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicciĂł s’empra en millorar la detecciĂł de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mĂnim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execuciĂł del 39%. Finalment, ens centrem a reduir cĂ lculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada pĂxel i fent cĂ lculs a cada localitzaciĂł mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el cĂ lcul del mateix color en pĂxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la segĂĽent imatge...Postprint (published version
An Assessment and Evaluation of Acidic Cleaning Methods on Unglazed Terracotta Using Accelerated Weathering Test Protocols
According to the published literature, there has been very little quantitative evaluation of the short or long-term effects of cleaning terra cotta, other than visual assessment where success is pronounced by the degree of soiling removed.
Very little work (only 3% of our literature review) has attempted to measure the effects on terra cotta of various cleaning methods.
Nevertheless, today, still 80% of terracotta cleaning relies on chemical products, the majority acid-based.
This research evaluates the effects of acidic cleaners on unglazed terracotta to verify the potential for damage by accelerated weathering testing. This investigation continues previous studies (Matero et. al. 1996) where findings showed that by using hydrofluoric acid-based commercial cleaning system, an increased porosity of unglazed terra cotta resulted. The questions remains whether this physical alteration will lead to accelerated weathering and material damage.
In the first phase of this research a Literature Review of past and current cleaning of terra cotta was completed, together with a survey of professionals involved in terra cotta restoration. In the second phase, two commercial chemical cleaners are being tested in two applications on new unglazed red and tan terracotta samples: Prosoco Heavy Duty Restoration cleaner based on HF (1:3), and Prosoco Enviro Klean based on Ammonium Bi-fluoride (generally applied as a concentrate). These are now undergoing accelerated weathering based on the Rilem salt test (V.1B) and a QUV weatherometer (ASTM G154-12) to access the effects of acid cleaning on performance.
Several methods of assessment were used to evaluate the tiles before and after testing: optical microscopy, scanning electron microscopy, porosity by liquid nitrogen immersion, color change, and texture mapping imaging.
By examining physical changes and their response to accelerated weathering across two typical terra cotta clay bodies, it is hoped that better cleaning methods will be considered in practice and parameters to measure potential damage as well as cleaning efficacy become established
Adaptivna tehnika obrade slike za kontrolu kvalitete u proizvodnji keramiÄŤkih ploÄŤica
Automation of the visual inspection for quality control in production of
materials with textures (tiles, textile, leather, etc.) is not widely implemented.
A sophisticated system for image acquisition, as well as a fast and efficient
procedure for texture analysis is needed for this purpose. In this paper the
Surface Failure Detection (SFD) algorithm for quality control in ceramic
tiles production is presented. It is based on Discrete Wavelet Transform
(DWT) and Probabilistic Neural Networks (PNN) with radial basis. DWT
provides a multi-resolution analysis, which mimics behavior of a human
visual system and it extracts from the tile image the features important
for failure detection. Neural networks are used for classification of the
tiles with respect to presence of defects. Classification efficiency mainly
depends on the proper choice of the training vectors for neural networks.
For neural networks preparation we propose an automated adaptive
technique based on statistics of the tiles defects textures. This technique
enables fast adaptation of the SFD algorithm to different textures, which
is important for automated visual inspection in the production of a new
tile type.Automatizacija vizualne provjere za kontrolu kvalitete u proizvodnji
materijala s teksturama (pločice, tekstil, kože, itd.) nije široko primijenjena
u praksi. Za ovu namjenu potreban je sofisticirani sustav za snimanje
slika, kao i brza i efikasna procedura za analizu tekstura. U ovom je
radu predstavljen algoritam za detekciju površinskih oštećenja (SFD)
u proizvodnji keramičkih pločica. Temelji se na diskretnoj valićnoj
transformaciji (DWT) i probabilistiÄŤkim neuronskim mreĹľama (PNN)
s radijalnim bazama. DWT omogućava više-rezolucijsku analizu koja
oponaša ljudski vizualni sustav i izdvaja iz slike pločice značajne za
detekciju oštećenja. Neuronske mreže se koriste za klasifikaciju pločica
ovisno o postojanju oštećenja. Efikasnost klasifikacije najviše ovisi o
odgovarajućem odabiru vektora za učenje neuronskih mreža. Za pripremu
neuronskih mreĹľa predlaĹľemo automatiziranu adaptivnu tehniku koja
se temelji na statistici tekstura oštećenja na pločicama. Ova tehnika
omogućava brzu adaptaciju SFD algoritma na različite teksture, što je
posebno vaĹľno za automatiziranu vizualnu provjeru u proizvodnji novog
tipa ploÄŤica
Adaptivna tehnika obrade slike za kontrolu kvalitete u proizvodnji keramiÄŤkih ploÄŤica
Automation of the visual inspection for quality control in production of
materials with textures (tiles, textile, leather, etc.) is not widely implemented.
A sophisticated system for image acquisition, as well as a fast and efficient
procedure for texture analysis is needed for this purpose. In this paper the
Surface Failure Detection (SFD) algorithm for quality control in ceramic
tiles production is presented. It is based on Discrete Wavelet Transform
(DWT) and Probabilistic Neural Networks (PNN) with radial basis. DWT
provides a multi-resolution analysis, which mimics behavior of a human
visual system and it extracts from the tile image the features important
for failure detection. Neural networks are used for classification of the
tiles with respect to presence of defects. Classification efficiency mainly
depends on the proper choice of the training vectors for neural networks.
For neural networks preparation we propose an automated adaptive
technique based on statistics of the tiles defects textures. This technique
enables fast adaptation of the SFD algorithm to different textures, which
is important for automated visual inspection in the production of a new
tile type.Automatizacija vizualne provjere za kontrolu kvalitete u proizvodnji
materijala s teksturama (pločice, tekstil, kože, itd.) nije široko primijenjena
u praksi. Za ovu namjenu potreban je sofisticirani sustav za snimanje
slika, kao i brza i efikasna procedura za analizu tekstura. U ovom je
radu predstavljen algoritam za detekciju površinskih oštećenja (SFD)
u proizvodnji keramičkih pločica. Temelji se na diskretnoj valićnoj
transformaciji (DWT) i probabilistiÄŤkim neuronskim mreĹľama (PNN)
s radijalnim bazama. DWT omogućava više-rezolucijsku analizu koja
oponaša ljudski vizualni sustav i izdvaja iz slike pločice značajne za
detekciju oštećenja. Neuronske mreže se koriste za klasifikaciju pločica
ovisno o postojanju oštećenja. Efikasnost klasifikacije najviše ovisi o
odgovarajućem odabiru vektora za učenje neuronskih mreža. Za pripremu
neuronskih mreĹľa predlaĹľemo automatiziranu adaptivnu tehniku koja
se temelji na statistici tekstura oštećenja na pločicama. Ova tehnika
omogućava brzu adaptaciju SFD algoritma na različite teksture, što je
posebno vaĹľno za automatiziranu vizualnu provjeru u proizvodnji novog
tipa ploÄŤica
Potters\u27 Norms: Examining the Social Organization of Ceramic Production of Panamanian Majolica and Criolla Wares in Panama la Vieja (1519-1673)
During the 16th and 17th century, the colonial city of Asunción de Panamá (now known as Panamá la Vieja) rose to regional prominence as a strategic geopolitical and commercial port due to its pivotal role along a transcontinental commercial network that connected Spain and its South American colonies. In the 154 years it was occupied by residents from diverse cultural backgrounds, contemporary but technologically- and compositionally-distinct ceramic industries developed and flourished in this city. Panamá la Vieja’s ceramic record presents a unique opportunity to examine how coexisting but seemingly distinct potting communities organized their craft and to explore whether their social structures were maintained over time in a colonial context.
This thesis analyzes a sample composed of two locally produced wares—one characterized by high-fired, wheel-thrown, and tin-glazed vessels known as Panamanian Majolica and the other by low-fired, handmade, and coarse-textured utilitarian vessels known as Criolla—that were recovered from two chronologically-distinct contexts in Panamá la Vieja. Through the application of macroscopic and microscopic characterizations, this study seeks to determine if the organization of each ceramic ware reflects the existence of discrete potting units or communities and whether diachronic change or continuity is observed in each ware’s production organization.
The results indicate that the production of Panamanian Majolica and Criolla differed greatly, not just in terms of the technological choices that were employed but most notably in the way each craft was organized and transmitted. In the case of the former, a centralized system was in place where social control was exerted by an established social hierarchy inside the workshop which ensured the adherence to a set of established production norms. That control is reflected in the low degree of compositional and technological variability of the Panamanian Majolica sample. In the case of the latter, production was decentralized and each potter appears to have been free to produce pots following his or her unique chaîne-opératoire without being subject to any form of political, social, or economic control. This decentralization is reflected in the high variability of Criolla fabrics identified in this study
Visibility rendering order: Improving energy efficiency on mobile GPUs through frame coherence
During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average over a state-of-the-art mobile GPU.Peer ReviewedPostprint (author's final draft
- …