1,615 research outputs found

    Exploiting frame coherence in real-time rendering for energy-efficient GPUs

    Get PDF
    The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images. The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications. First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads. Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%. Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de càlcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada més realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils és essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria són els principals consumidors d’energia en càrregues gràfiques, però molt d’aquest consum és malbaratat en càlculs redundants, ja que les animacions produïdes s¿aconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi és millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els càlculs i accessos redundants inherents a les aplicacions gràfiques. Primerament, ens centrem en reduir càlculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que més del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges són iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els càlculs i accessos a memòria de la regió. RE supera àmpliament propostes relacionades de la literatura, aconseguint una reducció del consum energètic del 37% i del temps d’execució del 33%. Seguidament, ens centrem en reduir càlculs redundants en fragments que eventualment no seran visibles. En aplicacions gràfiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serà visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informació obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible més llunyà després de processar cada regió de pantalla. Quan es processa una regió a la imatge següent, es prediu que les primitives més llunyanes a el punt guardat seran ocloses i es processen després de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicció s’empra en millorar la detecció de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mínim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execució del 39%. Finalment, ens centrem a reduir càlculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada píxel i fent càlculs a cada localització mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el càlcul del mateix color en píxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la següent imatge..

    Exploiting frame coherence in real-time rendering for energy-efficient GPUs

    Get PDF
    The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images. The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications. First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads. Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%. Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de càlcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada més realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils és essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria són els principals consumidors d’energia en càrregues gràfiques, però molt d’aquest consum és malbaratat en càlculs redundants, ja que les animacions produïdes s¿aconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi és millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els càlculs i accessos redundants inherents a les aplicacions gràfiques. Primerament, ens centrem en reduir càlculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que més del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges són iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els càlculs i accessos a memòria de la regió. RE supera àmpliament propostes relacionades de la literatura, aconseguint una reducció del consum energètic del 37% i del temps d’execució del 33%. Seguidament, ens centrem en reduir càlculs redundants en fragments que eventualment no seran visibles. En aplicacions gràfiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serà visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informació obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible més llunyà després de processar cada regió de pantalla. Quan es processa una regió a la imatge següent, es prediu que les primitives més llunyanes a el punt guardat seran ocloses i es processen després de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicció s’empra en millorar la detecció de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mínim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execució del 39%. Finalment, ens centrem a reduir càlculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada píxel i fent càlculs a cada localització mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el càlcul del mateix color en píxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la següent imatge...Postprint (published version

    An Assessment and Evaluation of Acidic Cleaning Methods on Unglazed Terracotta Using Accelerated Weathering Test Protocols

    Get PDF
    According to the published literature, there has been very little quantitative evaluation of the short or long-term effects of cleaning terra cotta, other than visual assessment where success is pronounced by the degree of soiling removed. Very little work (only 3% of our literature review) has attempted to measure the effects on terra cotta of various cleaning methods. Nevertheless, today, still 80% of terracotta cleaning relies on chemical products, the majority acid-based. This research evaluates the effects of acidic cleaners on unglazed terracotta to verify the potential for damage by accelerated weathering testing. This investigation continues previous studies (Matero et. al. 1996) where findings showed that by using hydrofluoric acid-based commercial cleaning system, an increased porosity of unglazed terra cotta resulted. The questions remains whether this physical alteration will lead to accelerated weathering and material damage. In the first phase of this research a Literature Review of past and current cleaning of terra cotta was completed, together with a survey of professionals involved in terra cotta restoration. In the second phase, two commercial chemical cleaners are being tested in two applications on new unglazed red and tan terracotta samples: Prosoco Heavy Duty Restoration cleaner based on HF (1:3), and Prosoco Enviro Klean based on Ammonium Bi-fluoride (generally applied as a concentrate). These are now undergoing accelerated weathering based on the Rilem salt test (V.1B) and a QUV weatherometer (ASTM G154-12) to access the effects of acid cleaning on performance. Several methods of assessment were used to evaluate the tiles before and after testing: optical microscopy, scanning electron microscopy, porosity by liquid nitrogen immersion, color change, and texture mapping imaging. By examining physical changes and their response to accelerated weathering across two typical terra cotta clay bodies, it is hoped that better cleaning methods will be considered in practice and parameters to measure potential damage as well as cleaning efficacy become established

    Adaptivna tehnika obrade slike za kontrolu kvalitete u proizvodnji keramiÄŤkih ploÄŤica

    Get PDF
    Automation of the visual inspection for quality control in production of materials with textures (tiles, textile, leather, etc.) is not widely implemented. A sophisticated system for image acquisition, as well as a fast and efficient procedure for texture analysis is needed for this purpose. In this paper the Surface Failure Detection (SFD) algorithm for quality control in ceramic tiles production is presented. It is based on Discrete Wavelet Transform (DWT) and Probabilistic Neural Networks (PNN) with radial basis. DWT provides a multi-resolution analysis, which mimics behavior of a human visual system and it extracts from the tile image the features important for failure detection. Neural networks are used for classification of the tiles with respect to presence of defects. Classification efficiency mainly depends on the proper choice of the training vectors for neural networks. For neural networks preparation we propose an automated adaptive technique based on statistics of the tiles defects textures. This technique enables fast adaptation of the SFD algorithm to different textures, which is important for automated visual inspection in the production of a new tile type.Automatizacija vizualne provjere za kontrolu kvalitete u proizvodnji materijala s teksturama (pločice, tekstil, kože, itd.) nije široko primijenjena u praksi. Za ovu namjenu potreban je sofisticirani sustav za snimanje slika, kao i brza i efikasna procedura za analizu tekstura. U ovom je radu predstavljen algoritam za detekciju površinskih oštećenja (SFD) u proizvodnji keramičkih pločica. Temelji se na diskretnoj valićnoj transformaciji (DWT) i probabilističkim neuronskim mrežama (PNN) s radijalnim bazama. DWT omogućava više-rezolucijsku analizu koja oponaša ljudski vizualni sustav i izdvaja iz slike pločice značajne za detekciju oštećenja. Neuronske mreže se koriste za klasifikaciju pločica ovisno o postojanju oštećenja. Efikasnost klasifikacije najviše ovisi o odgovarajućem odabiru vektora za učenje neuronskih mreža. Za pripremu neuronskih mreža predlažemo automatiziranu adaptivnu tehniku koja se temelji na statistici tekstura oštećenja na pločicama. Ova tehnika omogućava brzu adaptaciju SFD algoritma na različite teksture, što je posebno važno za automatiziranu vizualnu provjeru u proizvodnji novog tipa pločica

    Adaptivna tehnika obrade slike za kontrolu kvalitete u proizvodnji keramiÄŤkih ploÄŤica

    Get PDF
    Automation of the visual inspection for quality control in production of materials with textures (tiles, textile, leather, etc.) is not widely implemented. A sophisticated system for image acquisition, as well as a fast and efficient procedure for texture analysis is needed for this purpose. In this paper the Surface Failure Detection (SFD) algorithm for quality control in ceramic tiles production is presented. It is based on Discrete Wavelet Transform (DWT) and Probabilistic Neural Networks (PNN) with radial basis. DWT provides a multi-resolution analysis, which mimics behavior of a human visual system and it extracts from the tile image the features important for failure detection. Neural networks are used for classification of the tiles with respect to presence of defects. Classification efficiency mainly depends on the proper choice of the training vectors for neural networks. For neural networks preparation we propose an automated adaptive technique based on statistics of the tiles defects textures. This technique enables fast adaptation of the SFD algorithm to different textures, which is important for automated visual inspection in the production of a new tile type.Automatizacija vizualne provjere za kontrolu kvalitete u proizvodnji materijala s teksturama (pločice, tekstil, kože, itd.) nije široko primijenjena u praksi. Za ovu namjenu potreban je sofisticirani sustav za snimanje slika, kao i brza i efikasna procedura za analizu tekstura. U ovom je radu predstavljen algoritam za detekciju površinskih oštećenja (SFD) u proizvodnji keramičkih pločica. Temelji se na diskretnoj valićnoj transformaciji (DWT) i probabilističkim neuronskim mrežama (PNN) s radijalnim bazama. DWT omogućava više-rezolucijsku analizu koja oponaša ljudski vizualni sustav i izdvaja iz slike pločice značajne za detekciju oštećenja. Neuronske mreže se koriste za klasifikaciju pločica ovisno o postojanju oštećenja. Efikasnost klasifikacije najviše ovisi o odgovarajućem odabiru vektora za učenje neuronskih mreža. Za pripremu neuronskih mreža predlažemo automatiziranu adaptivnu tehniku koja se temelji na statistici tekstura oštećenja na pločicama. Ova tehnika omogućava brzu adaptaciju SFD algoritma na različite teksture, što je posebno važno za automatiziranu vizualnu provjeru u proizvodnji novog tipa pločica

    Potters\u27 Norms: Examining the Social Organization of Ceramic Production of Panamanian Majolica and Criolla Wares in Panama la Vieja (1519-1673)

    Get PDF
    During the 16th and 17th century, the colonial city of Asunción de Panamá (now known as Panamá la Vieja) rose to regional prominence as a strategic geopolitical and commercial port due to its pivotal role along a transcontinental commercial network that connected Spain and its South American colonies. In the 154 years it was occupied by residents from diverse cultural backgrounds, contemporary but technologically- and compositionally-distinct ceramic industries developed and flourished in this city. Panamá la Vieja’s ceramic record presents a unique opportunity to examine how coexisting but seemingly distinct potting communities organized their craft and to explore whether their social structures were maintained over time in a colonial context. This thesis analyzes a sample composed of two locally produced wares—one characterized by high-fired, wheel-thrown, and tin-glazed vessels known as Panamanian Majolica and the other by low-fired, handmade, and coarse-textured utilitarian vessels known as Criolla—that were recovered from two chronologically-distinct contexts in Panamá la Vieja. Through the application of macroscopic and microscopic characterizations, this study seeks to determine if the organization of each ceramic ware reflects the existence of discrete potting units or communities and whether diachronic change or continuity is observed in each ware’s production organization. The results indicate that the production of Panamanian Majolica and Criolla differed greatly, not just in terms of the technological choices that were employed but most notably in the way each craft was organized and transmitted. In the case of the former, a centralized system was in place where social control was exerted by an established social hierarchy inside the workshop which ensured the adherence to a set of established production norms. That control is reflected in the low degree of compositional and technological variability of the Panamanian Majolica sample. In the case of the latter, production was decentralized and each potter appears to have been free to produce pots following his or her unique chaîne-opératoire without being subject to any form of political, social, or economic control. This decentralization is reflected in the high variability of Criolla fabrics identified in this study

    Visibility rendering order: Improving energy efficiency on mobile GPUs through frame coherence

    Get PDF
    During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average over a state-of-the-art mobile GPU.Peer ReviewedPostprint (author's final draft
    • …
    corecore