88 research outputs found

    Map Generation from Large Scale Incomplete and Inaccurate Data Labels

    Full text link
    Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in developing an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous studies, most of which use datasets that are available only in a few cities across the world, we utilizes publicly available imagery and map data, both of which cover the contiguous United States (CONUS). We approach the technical challenge of inaccurate and incomplete training data adopting state-of-the-art convolutional neural network architectures such as the U-Net and the CycleGAN to incrementally generate maps with increasingly more accurate and more complete labels of man-made infrastructure such as roads and houses. Since scaling the mapping task to CONUS calls for parallelization, we then adopted an asynchronous distributed stochastic parallel gradient descent training scheme to distribute the computational workload onto a cluster of GPUs with nearly linear speed-up.Comment: This paper is accepted by KDD 202

    Scalable Real-Time Rendering for Extremely Complex 3D Environments Using Multiple GPUs

    Get PDF
    In 3D visualization, real-time rendering of high-quality meshes in complex 3D environments is still one of the major challenges in computer graphics. New data acquisition techniques like 3D modeling and scanning have drastically increased the requirement for more complex models and the demand for higher display resolutions in recent years. Most of the existing acceleration techniques using a single GPU for rendering suffer from the limited GPU memory budget, the time-consuming sequential executions, and the finite display resolution. Recently, people have started building commodity workstations with multiple GPUs and multiple displays. As a result, more GPU memory is available across a distributed cluster of GPUs, more computational power is provided throughout the combination of multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor (resulting in a tiled large display configuration). However, using a multi-GPU workstation may not always give the desired rendering performance due to the imbalanced rendering workloads among GPUs and overheads caused by inter-GPU communication. In this dissertation, I contribute a multi-GPU multi-display parallel rendering approach for complex 3D environments. The approach has the capability to support a high-performance and high-quality rendering of static and dynamic 3D environments. A novel parallel load balancing algorithm is developed based on a screen partitioning strategy to dynamically balance the number of vertices and triangles rendered by each GPU. The overhead of inter-GPU communication is minimized by transferring only a small amount of image pixels rather than chunks of 3D primitives with a novel frame exchanging algorithm. The state-of-the-art parallel mesh simplification and GPU out-of-core techniques are integrated into the multi-GPU multi-display system to accelerate the rendering process

    A graphics processing unit based method for dynamic real-time global illumination

    Get PDF
    Real-time realistic image synthesis for virtual environments has been one of the most actively researched areas in computer graphics for over a decade. Images that display physically correct illumination of an environment can be simulated by evaluating a multi-dimensional integral equation, called the rendering equation, over the surfaces of the environment. Many global illumination algorithms such as pathtracing, photon mapping and distributed ray-tracing can produce realistic images but are generally unable to cope with dynamic lighting and objects at interactive rates. It still remains one of most challenging problems to simulate physically correctly illuminated dynamic environments without a substantial preprocessing step. In this thesis we present a rendering system for dynamic environments by implementing a customized rasterizer for global illumination entirely on the graphics hardware, the Graphical Processing Unit. Our research focuses on a parameterization of discrete visibility field for efficient indirect illumination computation. In order to generate the visibility field, we propose a CUDA-based (Compute Unified Device Architecture) rasterizer which builds Layered Hit Buffers (LHB) by rasterizing polygons into multi-layered structural buffers in parallel. The LHB provides a fast visibility function for any direction at any point. We propose a cone approximation solution to resolve an aliasing problem due to limited directional discretization. We also demonstrate how to remove structure noises by adapting an interleaved sampling scheme and discontinuity buffer. We show that a gathering method amortized with a multi-level Quasi Mont Carlo method can evaluate the rendering equation in real-time. The method can realize real-time walk-through of a complex virtual environment that has a mixture of diffuse and glossy reflection, computing multiple indirect bounces on the fly. We show that our method is capable of simulating fully dynamic environments including changes of view, materials, lighting and objects at interactive rates on commodity level graphics hardware

    Hierarchical and Adaptive Filter and Refinement Algorithms for Geometric Intersection Computations on GPU

    Get PDF
    Geometric intersection algorithms are fundamental in spatial analysis in Geographic Information System (GIS). This dissertation explores high performance computing solution for geometric intersection on a huge amount of spatial data using Graphics Processing Unit (GPU). We have developed a hierarchical filter and refinement system for parallel geometric intersection operations involving large polygons and polylines by extending the classical filter and refine algorithm using efficient filters that leverage GPU computing. The inputs are two layers of large polygonal datasets and the computations are spatial intersection on pairs of cross-layer polygons. These intersections are the compute-intensive spatial data analytic kernels in spatial join and map overlay operations in spatial databases and GIS. Efficient filters, such as PolySketch, PolySketch++ and Point-in-polygon filters have been developed to reduce refinement workload on GPUs. We also showed the application of such filters in speeding-up line segment intersections and point-in-polygon tests. Programming models like CUDA and OpenACC have been used to implement the different versions of the Hierarchical Filter and Refine (HiFiRe) system. Experimental results show good performance of our filter and refinement algorithms. Compared to standard R-tree filter, on average, our filter technique can still discard 76% of polygon pairs which do not have segment intersection points. PolySketch filter reduces on average 99.77% of the workload of finding line segment intersections. Compared to existing Common Minimum Bounding Rectangle (CMBR) filter that is applied on each cross-layer candidate pair, the workload after using PolySketch-based CMBR filter is on average 98% smaller. The execution time of our HiFiRe system on two shapefiles, namely USA Water Bodies (contains 464K polygons) and USA Block Group Boundaries (contains 220K polygons), is about 3.38 seconds using NVidia Titan V GPU

    Reducing redundancy of real time computer graphics in mobile systems

    Get PDF
    The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version

    Perceptually optimized real-time computer graphics

    Get PDF
    Perceptual optimization, the application of human visual perception models to remove imperceptible components in a graphics system, has been proven effective in achieving significant computational speedup. Previous implementations of this technique have focused on spatial level of detail reduction, which typically results in noticeable degradation of image quality. This thesis introduces refresh rate modulation (RRM), a novel perceptual optimization technique that produces better performance enhancement while more effectively preserving image quality and resolving static scene elements in full detail. In order to demonstrate the effectiveness of this technique, a graphics framework has been developed that interfaces with eye tracking hardware to take advantage of user fixation data in real-time. Central to the framework is a high-performance GPGPU ray-tracing engine written in OpenCL. RRM reduces the frequency with which pixels outside of the foveal region are updated by the ray-tracer. A persistent pixel buffer is maintained such that peripheral data from previous frames provides context for the foveal image in the current frame. Traditional optimization techniques have also been incorporated into the ray-tracer for improved performance. Applying the RRM technique to the ray-tracing engine results in a speedup of 2.27 (252 fps vs. 111 fps at 1080p) for the classic Whitted scene with reflection and transmission enabled. A speedup of 3.41 (140 fps vs. 41 fps at 1080p) is observed for a high-polygon scene that depicts the Stanford Bunny. A small pilot study indicates that RRM achieves these results with minimal impact to perceived image quality. A secondary investigation is conducted regarding the performance benefits of increasing physics engine error tolerance for bounding volume hierarchy based collision detection when the scene elements involved are in the user\u27s periphery. The open-source Bullet Physics Library was used to add accurate collision detection to the full resolution ray-tracing engine. For a scene with a static high-polygon model and 50 moving spheres, a speedup of 1.8 was observed for physics calculations. The development and integration of this subsystem demonstrates the extensibility of the graphics framework

    Hardware Acceleration of Progressive Refinement Radiosity using Nvidia RTX

    Full text link
    A vital component of photo-realistic image synthesis is the simulation of indirect diffuse reflections, which still remain a quintessential hurdle that modern rendering engines struggle to overcome. Real-time applications typically pre-generate diffuse lighting information offline using radiosity to avoid performing costly computations at run-time. In this thesis we present a variant of progressive refinement radiosity that utilizes Nvidia's novel RTX technology to accelerate the process of form-factor computation without compromising on visual fidelity. Through a modern implementation built on DirectX 12 we demonstrate that offloading radiosity's visibility component to RT cores significantly improves the lightmap generation process and potentially propels it into the domain of real-time.Comment: 114 page

    Foundations and Methods for GPU based Image Synthesis

    Get PDF
    Effects such as global illumination, caustics, defocus and motion blur are an integral part of generating images that are perceived as realistic pictures and cannot be distinguished from photographs. In general, two different approaches exist to render images: ray tracing and rasterization. Ray tracing is a widely used technique for production quality rendering of images. The image quality and physical correctness are more important than the time needed for rendering. Generating these effects is a very compute and memory intensive process and can take minutes to hours for a single camera shot. Rasterization on the other hand is used to render images if real-time constraints have to be met (e.g. computer games). Often specialized algorithms are used to approximate these complex effects to achieve plausible results while sacrificing image quality for performance. This thesis is split into two parts. In the first part we look at algorithms and load-balancing schemes for general purpose computing on graphics processing units (GPUs). Most of the ray tracing related algorithms (e.g. KD-tree construction or bidirectional path tracing) have unpredictable memory requirements. Dynamic memory allocation on GPUs suffers from global synchronization required to keep the state of current allocations. We present a method to reduce this overhead on massively parallel hardware architectures. In particular, we merge small parallel allocation requests from different threads that can occur while exploiting SIMD style parallelism. We speed-up the dynamic allocation using a set of constraints that can be applied to a large class of parallel algorithms. To achieve the image quality needed for feature films GPU-cluster are often used to cope with the amount of computation needed. We present a framework that employs a dynamic load balancing approach and applies fair scheduling to minimize the average execution time of spawned computational tasks. The load balancing capabilities are shown by handling irregular workloads: a bidirectional path tracer allowing renderings of complex effects at near interactive frame rates. In the second part of the thesis we try to reduce the image quality gap between production and real-time rendering. Therefore, an adaptive acceleration structure for screen-space ray tracing is presented that represents the scene geometry by planar approximations. The benefit is a fast method to skip empty space and compute exact intersection points based on the planar approximation. This technique allows simulating complex phenomena including depth-of-field rendering and ray traced reflections at real-time frame rates. To handle motion blur in combination with transparent objects we present a unified rendering approach that decouples space and time sampling. Thereby, we can achieve interactive frame rates by reusing fragments during the sampling step. The scene geometry that is potentially visible at any point in time for the duration of a frame is rendered in a rasterization step and stored in temporally varying fragments. We perform spatial sampling to determine all temporally varying fragments that intersect with a specific viewing ray at any point in time. Viewing rays can be sampled according to the lens uv-sampling to incorporate depth-of-field. In a final temporal sampling step, we evaluate the pre-determined viewing ray/fragment intersections for one or multiple points in time. This allows incorporating standard shading effects including and resulting in a physically plausible motion and defocus blur for transparent and opaque objects

    Real-Time deep image rendering and order independent transparency

    Get PDF
    In computer graphics some operations can be performed in either object space or image space. Image space computation can be advantageous, especially with the high parallelism of GPUs, improving speed, accuracy and ease of implementation. For many image space techniques the information contained in regular 2D images is limiting. Recent graphics hardware features, namely atomic operations and dynamic memory location writes, now make it possible to capture and store all per-pixel fragment data from the rasterizer in a single pass in what we call a deep image. A deep image provides a state where all fragments are available and gives a more complete image based geometry representation, providing new possibilities in image based rendering techniques. This thesis investigates deep images and their growing use in real-time image space applications. A focus is new techniques for improving fundamental operation performance, including construction, storage, fast fragment sorting and sampling. A core and driving application is order-independent transparency (OIT). A number of deep image sorting improvements are presented, through which an order of magnitude performance increase is achieved, significantly advancing the ability to perform transparency rendering in real time. In the broader context of image based rendering we look at deep images as a discretized 3D geometry representation and discuss sampling techniques for raycasting and antialiasing with an implicit fragment connectivity approach. Using these ideas a more computationally complex application is investigated — image based depth of field (DoF). Deep images are used to provide partial occlusion, and in particular a form of deep image mipmapping allows a fast approximate defocus blur of up to full screen size

    Sparse Volumetric Deformation

    Get PDF
    Volume rendering is becoming increasingly popular as applications require realistic solid shape representations with seamless texture mapping and accurate filtering. However rendering sparse volumetric data is difficult because of the limited memory and processing capabilities of current hardware. To address these limitations, the volumetric information can be stored at progressive resolutions in the hierarchical branches of a tree structure, and sampled according to the region of interest. This means that only a partial region of the full dataset is processed, and therefore massive volumetric scenes can be rendered efficiently. The problem with this approach is that it currently only supports static scenes. This is because it is difficult to accurately deform massive amounts of volume elements and reconstruct the scene hierarchy in real-time. Another problem is that deformation operations distort the shape where more than one volume element tries to occupy the same location, and similarly gaps occur where deformation stretches the elements further than one discrete location. It is also challenging to efficiently support sophisticated deformations at hierarchical resolutions, such as character skinning or physically based animation. These types of deformation are expensive and require a control structure (for example a cage or skeleton) that maps to a set of features to accelerate the deformation process. The problems with this technique are that the varying volume hierarchy reflects different feature sizes, and manipulating the features at the original resolution is too expensive; therefore the control structure must also hierarchically capture features according to the varying volumetric resolution. This thesis investigates the area of deforming and rendering massive amounts of dynamic volumetric content. The proposed approach efficiently deforms hierarchical volume elements without introducing artifacts and supports both ray casting and rasterization renderers. This enables light transport to be modeled both accurately and efficiently with applications in the fields of real-time rendering and computer animation. Sophisticated volumetric deformation, including character animation, is also supported in real-time. This is achieved by automatically generating a control skeleton which is mapped to the varying feature resolution of the volume hierarchy. The output deformations are demonstrated in massive dynamic volumetric scenes
    corecore