667 research outputs found

    Reducing redundancy of real time computer graphics in mobile systems

    Get PDF
    The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version

    A Technical Solution to Allow Off-line Mobile Map Querying of Discrete and Continuous Geographic Attribute Data

    Full text link
    In this article, a technique towards the generation of hybrid raster-attributes map for use in mobile devices is described. Our solution is based on coding the map attributes within an image using RGB values. The designed coding method enables the simultaneous storage of discrete thematic attributes and continuous quantitative attributes. This approach offers a wide range of possible uses. Small memory storage requirements and the simplicity of the software enable this coding method to be used efficiently in mobile devices without Internet connection. This article describes the basic fundamentals of the coding technique, as well as the operation and limitations regarding the volume of information. Two specific applications are presented: a topographic map used for recreational activities, and a visitor map of a university campus.Palomar-Vázquez, J.; Pardo Pascual, JE.; Sebastiá Tarín, L.; Recio Recio, JA. (2012). A Technical Solution to Allow Off-line Mobile Map Querying of Discrete and Continuous Geographic Attribute Data. Cartographic Journal. 49(2):143-152. doi:10.1179/1743277411Y.0000000029S14315249

    Object based manipulation with 3D scenes in mobile environment

    Get PDF
    The increasing power and display resolution of mobile devices allow the user nowadays to work with 3D information in mobile environment. The use of this new technology brings some new problems that need an urgent solution. One of them has its roots in the fact that common users are not trained to work in 3D graphical environment in general. The main obstacle for a common user is the fact that 3D environment offers too much freedom for object manipulation in comparison with situation in 2D environment. There are various solutions to this problem but usually they do not handle efficiently the problem of navigation in 3D environment in context of handling appropriate information density on a small screen on a mobile device. Our approach is based on transformation of a 3D scene in 2D representation in order to decrease freedom of movement during navigation in the scene. The navigation (and other types of interaction) is performed in 2D environment - the information acquired during interaction is transformed back into 3D form (into 3D object space). In such a way the user can manipulate with single objects in various modes (e.g. the user can change focus on various objects what might e.g. result in their temporary elimination from the scene). The work in 2D in mobile environment requires some special features that are not considered in standard environment. Especially critical is zooming. Zooming in raster form could change the picture on a small screen into completely unusable representation (pixels will be transformed in tiles that will be difficult to perceive). Solution to this (and to other related problems) is the use of SVG format to handle 2D scene representation. The main advantage of SVG is that it is object oriented and supports zooming, interaction with objects, scene annotation and manipulations with objects. The process of scene transformation and manipulation with objects is distributed between the server and mobile device. The transformation is performed on the server side. The resulting 2D representation of the scene is sent to mobile device where the visualization is performed. The user can interact with the picture on the mobile device with interaction techniques available on mobile devices. As the SVG is XML based format, the Cascading Style Sheets can be used to style all graphical elements. Therefore each object can be visualized in different mode (solid, transparent, semitransparent). These modes could be changed interactively. This feature (interactive change of modes) substitute in certain sense some functions available for avatars during their walkthroughs when performed in standard 3D environment (e.g. "having look behind corner" etc.). The user interaction in 2D environment is rather limited in comparison with standard 3D interaction but it is less demanding for the user. On the other hand it brings better picture quality due to SVG features applied e.g. during the process of zooming

    Lightweight Representation of Multi-origin Web Content

    Get PDF
    Certain features such as browser tab previews, low-memory frame representations, etc., use low-fidelity snapshots of webpages. Low-fidelity snapshots can be created by rasterizing the webpage or by converting it to PDF format. Screenshots can use substantial memory, lose the hyperlinks on a page, and do not provide the ability to scroll and zoom without pixelation artifacts. Typical print-to-PDF applications do not support scrolling screenshots and sub-frames. This disclosure describes techniques to record content of the frames of a webpage, including multi-origin iframes, such that each frame (or sub-frame) of a webpage can be scrolled independently. For security, hyperlink information, e.g., the hit regions and the URLs, are stored separately from webpage recordings, and frame recording and rasterization are carried out using sandboxed processes. The techniques produce a low-memory representation of a webpage with minimal pixelation effects, with full text fidelity regardless of zoom level, and active hyperlinks and scrollable sub-frames

    Microarchitectural-level simulator for parallel tile rendering on mobile GPUs

    Get PDF
    Mobile devices have led the boom in the technological segment in the recent years. They have witnessed a tremendous improvement in screen resolution and high-quality graphics because of the growing demand for playing games and other animated graphics applications. However, the demand for rendering more realistic scenes brings with it a significant increase in computation and memory bandwidth. This inevitably translates to an increase in energy consumption. Since GPUs are battery operated, energy-efficiency is an important design factor as it dictates their autonomy. In this work, we present a novel technique which we term Parallel Tile Rendering (PTR), which aims to exploit new sources of parallelism in a GPU. Under PTR, we rasterize multiple tiles in parallel using two different rasterization lanes, called Raster Units, in architectures for mobile GPUs. In this way, we dramatically reduce the required cycles for rasterization, which has been seen to be the most time-demanding process when rendering images. Experimental results show that PTR can achieve an average speedup of 83% for a wide range of different benchmarks, each of them with different characteristics. In fact, it is much more effective than having the same amount of computing resources but in a single Raster Unit, with an increase in performance of 8.3% on average. Moreover, PTR provides significant energy savings with an average decrease of 9.86%

    Faster data structures and graphics hardware techniques for high performance rendering

    Get PDF
    Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isn’t a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time

    Efficient Algorithms for Coastal Geographic Problems

    Get PDF
    The increasing performance of computers has made it possible to solve algorithmically problems for which manual and possibly inaccurate methods have been previously used. Nevertheless, one must still pay attention to the performance of an algorithm if huge datasets are used or if the problem iscomputationally difficult. Two geographic problems are studied in the articles included in this thesis. In the first problem the goal is to determine distances from points, called study points, to shorelines in predefined directions. Together with other in-formation, mainly related to wind, these distances can be used to estimate wave exposure at different areas. In the second problem the input consists of a set of sites where water quality observations have been made and of the results of the measurements at the different sites. The goal is to select a subset of the observational sites in such a manner that water quality is still measured in a sufficient accuracy when monitoring at the other sites is stopped to reduce economic cost. Most of the thesis concentrates on the first problem, known as the fetch length problem. The main challenge is that the two-dimensional map is represented as a set of polygons with millions of vertices in total and the distances may also be computed for millions of study points in several directions. Efficient algorithms are developed for the problem, one of them approximate and the others exact except for rounding errors. The solutions also differ in that three of them are targeted for serial operation or for a small number of CPU cores whereas one, together with its further developments, is suitable also for parallel machines such as GPUs.Tietokoneiden suorituskyvyn kasvaminen on tehnyt mahdolliseksi ratkaista algoritmisesti ongelmia, joita on aiemmin tarkasteltu paljon ihmistyötä vaativilla, mahdollisesti epätarkoilla, menetelmillä. Algoritmien suorituskykyyn on kuitenkin toisinaan edelleen kiinnitettävä huomiota lähtömateriaalin suuren määrän tai ongelman laskennallisen vaikeuden takia. Väitöskirjaansisältyvissäartikkeleissatarkastellaankahtamaantieteellistä ongelmaa. Ensimmäisessä näistä on määritettävä etäisyyksiä merellä olevista pisteistä lähimpään rantaviivaan ennalta määrätyissä suunnissa. Etäisyyksiä ja tuulen voimakkuutta koskevien tietojen avulla on mahdollista arvioida esimerkiksi aallokon voimakkuutta. Toisessa ongelmista annettuna on joukko tarkkailuasemia ja niiltä aiemmin kerättyä tietoa erilaisista vedenlaatua kuvaavista parametreista kuten sameudesta ja ravinteiden määristä. Tehtävänä on valita asemajoukosta sellainen osa joukko, että vedenlaatua voidaan edelleen tarkkailla riittävällä tarkkuudella, kun mittausten tekeminen muilla havaintopaikoilla lopetetaan kustannusten säästämiseksi. Väitöskirja keskittyy pääosin ensimmäisen ongelman, suunnattujen etäisyyksien, ratkaisemiseen. Haasteena on se, että tarkasteltava kaksiulotteinen kartta kuvaa rantaviivan tyypillisesti miljoonista kärkipisteistä koostuvana joukkonapolygonejajaetäisyyksiäonlaskettavamiljoonilletarkastelupisteille kymmenissä eri suunnissa. Ongelmalle kehitetään tehokkaita ratkaisutapoja, joista yksi on likimääräinen, muut pyöristysvirheitä lukuun ottamatta tarkkoja. Ratkaisut eroavat toisistaan myös siinä, että kolme menetelmistä on suunniteltu ajettavaksi sarjamuotoisesti tai pienellä määrällä suoritinytimiä, kun taas yksi menetelmistä ja siihen tehdyt parannukset soveltuvat myös voimakkaasti rinnakkaisille laitteille kuten GPU:lle. Vedenlaatuongelmassa annetulla asemajoukolla on suuri määrä mahdollisia osajoukkoja. Lisäksi tehtävässä käytetään aikaa vaativia operaatioita kuten lineaarista regressiota, mikä entisestään rajoittaa sitä, kuinka monta osajoukkoa voidaan tutkia. Ratkaisussa käytetäänkin heuristiikkoja, jotkaeivät välttämättä tuota optimaalista lopputulosta.Siirretty Doriast

    MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

    Full text link
    Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views. However, they rely upon specialized volumetric rendering algorithms based on ray marching that are mismatched to the capabilities of widely deployed graphics hardware. This paper introduces a new NeRF representation based on textured polygons that can synthesize novel images efficiently with standard rendering pipelines. The NeRF is represented as a set of polygons with textures representing binary opacities and feature vectors. Traditional rendering of the polygons with a z-buffer yields an image with features at every pixel, which are interpreted by a small, view-dependent MLP running in a fragment shader to produce a final pixel color. This approach enables NeRFs to be rendered with the traditional polygon rasterization pipeline, which provides massive pixel-level parallelism, achieving interactive frame rates on a wide range of compute platforms, including mobile phones.Comment: CVPR 2023. Project page: https://mobile-nerf.github.io, code: https://github.com/google-research/jax3d/tree/main/jax3d/projects/mobilener
    corecore