82 research outputs found

    Techniques for an image space occlusion culling engine

    Get PDF
    In this work we present several techniques applied to implement an Image Space Software Occlusion Culling Engine to increase the speed of rendering general dynamic scenes with high depth complexity. This conservative culling method is based on a tiled Occlusion Map that is updated only when needed, deferring and even avoiding the expensive per pixel rasterization process. We show how the tiles become a useful way to increase the speed of visibility tests. Finally we describe how different parts of the engine were parallelized using OpenMP directives and SIMD instructions.Eje: Workshop Computación gráfica, imágenes y visualización (WCGIV)Red de Universidades con Carreras en Informática (RedUNCI

    Techniques for an image space occlusion culling engine

    Get PDF
    In this work we present several techniques applied to implement an Image Space Software Occlusion Culling Engine to increase the speed of rendering general dynamic scenes with high depth complexity. This conservative culling method is based on a tiled Occlusion Map that is updated only when needed, deferring and even avoiding the expensive per pixel rasterization process. We show how the tiles become a useful way to increase the speed of visibility tests. Finally we describe how different parts of the engine were parallelized using OpenMP directives and SIMD instructions.Eje: Workshop Computación gráfica, imágenes y visualización (WCGIV)Red de Universidades con Carreras en Informática (RedUNCI

    Scalable Realtime Rendering and Interaction with Digital Surface Models of Landscapes and Cities

    Get PDF
    Interactive, realistic rendering of landscapes and cities differs substantially from classical terrain rendering. Due to the sheer size and detail of the data which need to be processed, realtime rendering (i.e. more than 25 images per second) is only feasible with level of detail (LOD) models. Even the design and implementation of efficient, automatic LOD generation is ambitious for such out-of-core datasets considering the large number of scales that are covered in a single view and the necessity to maintain screen-space accuracy for realistic representation. Moreover, users want to interact with the model based on semantic information which needs to be linked to the LOD model. In this thesis I present LOD schemes for the efficient rendering of 2.5d digital surface models (DSMs) and 3d point-clouds, a method for the automatic derivation of city models from raw DSMs, and an approach allowing semantic interaction with complex LOD models. The hierarchical LOD model for digital surface models is based on a quadtree of precomputed, simplified triangle mesh approximations. The rendering of the proposed model is proved to allow real-time rendering of very large and complex models with pixel-accurate details. Moreover, the necessary preprocessing is scalable and fast. For 3d point clouds, I introduce an LOD scheme based on an octree of hybrid plane-polygon representations. For each LOD, the algorithm detects planar regions in an adequately subsampled point cloud and models them as textured rectangles. The rendering of the resulting hybrid model is an order of magnitude faster than comparable point-based LOD schemes. To automatically derive a city model from a DSM, I propose a constrained mesh simplification. Apart from the geometric distance between simplified and original model, it evaluates constraints based on detected planar structures and their mutual topological relations. The resulting models are much less complex than the original DSM but still represent the characteristic building structures faithfully. Finally, I present a method to combine semantic information with complex geometric models. My approach links the semantic entities to the geometric entities on-the-fly via coarser proxy geometries which carry the semantic information. Thus, semantic information can be layered on top of complex LOD models without an explicit attribution step. All findings are supported by experimental results which demonstrate the practical applicability and efficiency of the methods

    Point based graphics rendering with unified scalability solutions.

    Get PDF
    Standard real-time 3D graphics rendering algorithms use brute force polygon rendering, with complexity linear in the number of polygons and little regard for limiting processing to data that contributes to the image. Modern hardware can now render smaller scenes to pixel levels of detail, relaxing surface connectivity requirements. Sub-linear scalability optimizations are typically self-contained, requiring specific data structures, without shared functions and data. A new point based rendering algorithm 'Canopy' is investigated that combines multiple typically sub-linear scalability solutions, using a small core of data structures. Specifically, locale management, hierarchical view volume culling, backface culling, occlusion culling, level of detail and depth ordering are addressed. To demonstrate versatility further, shadows and collision detection are examined. Polygon models are voxelized with interpolated attributes to provide points. A scene tree is constructed, based on a BSP tree of points, with compressed attributes. The scene tree is embedded in a compressed, partitioned, procedurally based scene graph architecture that mimics conventional systems with groups, instancing, inlines and basic read on demand rendering from backing store. Hierarchical scene tree refinement constructs an image tree image space equivalent, with object space scene node points projected, forming image node equivalents. An image graph of image nodes is maintained, describing image and object space occlusion relationships, hierarchically refined with front to back ordering to a specified threshold whilst occlusion culling with occluder fusion. Visible nodes at medium levels of detail are refined further to rasterization scales. Occlusion culling defines a set of visible nodes that can support caching for temporal coherence. Occlusion culling is approximate, possibly not suiting critical applications. Qualities and performance are tested against standard rendering. Although the algorithm has a 0(f) upper bound in the scene sizef, it is shown to practically scale sub-linearly. Scenes with several hundred billion polygons conventionally, are rendered at interactive frame rates with minimal graphics hardware support

    Exploiting frame coherence in real-time rendering for energy-efficient GPUs

    Get PDF
    The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images. The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications. First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads. Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%. Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de càlcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada més realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils és essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria són els principals consumidors d’energia en càrregues gràfiques, però molt d’aquest consum és malbaratat en càlculs redundants, ja que les animacions produïdes s¿aconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi és millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els càlculs i accessos redundants inherents a les aplicacions gràfiques. Primerament, ens centrem en reduir càlculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que més del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges són iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els càlculs i accessos a memòria de la regió. RE supera àmpliament propostes relacionades de la literatura, aconseguint una reducció del consum energètic del 37% i del temps d’execució del 33%. Seguidament, ens centrem en reduir càlculs redundants en fragments que eventualment no seran visibles. En aplicacions gràfiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serà visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informació obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible més llunyà després de processar cada regió de pantalla. Quan es processa una regió a la imatge següent, es prediu que les primitives més llunyanes a el punt guardat seran ocloses i es processen després de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicció s’empra en millorar la detecció de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mínim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execució del 39%. Finalment, ens centrem a reduir càlculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada píxel i fent càlculs a cada localització mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el càlcul del mateix color en píxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la següent imatge...Postprint (published version

    Interactive ray tracing of massive and deformable models

    Get PDF
    Ray tracing is a fundamental algorithm used for many applications such as computer graphics, geometric simulation, collision detection and line-of-sight computation. Even though the performance of ray tracing algorithms scales with the model complexity, the high memory requirements and the use of static hierarchical structures pose problems with massive models and dynamic data-sets. We present several approaches to address these problems based on new acceleration structures and traversal algorithms. We introduce a compact representation for storing the model and hierarchy while ray tracing triangle meshes that can reduce the memory footprint by up to 80%, while maintaining high performance. As a result, can ray trace massive models with hundreds of millions of triangles on workstations with a few gigabytes of memory. We also show how to use bounding volume hierarchies for ray tracing complex models with interactive performance. In order to handle dynamic scenes, we use refitting algorithms and also present highly-parallel GPU-based algorithms to reconstruct the hierarchies. In practice, our method can construct hierarchies for models with hundreds of thousands of triangles at interactive speeds. Finally, we demonstrate several applications that are enabled by these algorithms. Using deformable BVH and fast data parallel techniques, we introduce a geometric sound propagation algorithm that can run on complex deformable scenes interactively and orders of magnitude faster than comparable previous approaches. In addition, we also use these hierarchical algorithms for fast collision detection between deformable models and GPU rendering of shadows on massive models by employing our compact representations for hybrid ray tracing and rasterization
    • …
    corecore