Search CORE

209 research outputs found

Scalable Real-Time Rendering for Extremely Complex 3D Environments Using Multiple GPUs

Author: Dong Yangzi
Publication venue: RIT Scholar Works
Publication date: 01/12/2020
Field of study

In 3D visualization, real-time rendering of high-quality meshes in complex 3D environments is still one of the major challenges in computer graphics. New data acquisition techniques like 3D modeling and scanning have drastically increased the requirement for more complex models and the demand for higher display resolutions in recent years. Most of the existing acceleration techniques using a single GPU for rendering suffer from the limited GPU memory budget, the time-consuming sequential executions, and the finite display resolution. Recently, people have started building commodity workstations with multiple GPUs and multiple displays. As a result, more GPU memory is available across a distributed cluster of GPUs, more computational power is provided throughout the combination of multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor (resulting in a tiled large display configuration). However, using a multi-GPU workstation may not always give the desired rendering performance due to the imbalanced rendering workloads among GPUs and overheads caused by inter-GPU communication. In this dissertation, I contribute a multi-GPU multi-display parallel rendering approach for complex 3D environments. The approach has the capability to support a high-performance and high-quality rendering of static and dynamic 3D environments. A novel parallel load balancing algorithm is developed based on a screen partitioning strategy to dynamically balance the number of vertices and triangles rendered by each GPU. The overhead of inter-GPU communication is minimized by transferring only a small amount of image pixels rather than chunks of 3D primitives with a novel frame exchanging algorithm. The state-of-the-art parallel mesh simplification and GPU out-of-core techniques are integrated into the multi-GPU multi-display system to accelerate the rendering process

RIT Scholar Works

Optimization techniques for a 2D Engine

Author: Tello Panea David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 14/10/2022
Field of study

This document contains the description of the development of a small 2D engine written in C++. The focus of this project is to implement various optimization techniques to have a performant application. The result is a small 2D engine that can be executed in any Windows machine and can handle more than 10.000 entities interacting with each other in real time. The source code for this project is public and under the MIT License and can be found in the Github repository in the following link: https://github.com/DavidTello1/2D-Rendere

UPCommons. Portal del coneixement obert de la UPC

Optimizing State Changes in Rendering Engines

Author: Bányász Dániel
Szécsi László
Publication venue
Publication date: 21/02/2014
Field of study

Repository of the Academy's Library

3DRepo4Unity: Dynamic Loading of Version Controlled 3D Assets into the Unity Game Engine

Author: Dobǒs J
Fan C
Friston S
Scully T
Steed A
Publication venue: Web3D
Publication date: 05/06/2017
Field of study

In recent years, Unity has become a popular platform for the development of a broad range of visualization and VR applications. This is due to its ease of use, cross-platform compatibility and accessibility to independent developers. Despite such applications being cross-platform, their assets are generally bundled with executables, or streamed at runtime in a highly optimised, proprietary format. In this paper, we present a novel system for dynamically populating a Unity environment at runtime using open Web3D standards. Our system generates dynamic resources at runtime from a remote 3D Repo repository. This enables us to build a viewer which can easily visualize X3D-based revisions from a version controlled database in the cloud without any compile-time knowledge of the assets. We motivate the work and introduce the high-level architecture of our solution. We describe our new dynamic transcoding library with an emphasis on scalability and 3D rendering. We then perform a comparative evaluation between 3drepo.io, a state of the art X3DOM based renderer, and the new 3DRepo4Unity library on web browser platforms. Finally, we present a number of different applications that demonstrate the practicality of our chosen approach. By building on previous Web3D functionality and standards, our hope is to stimulate further discussion around and research into web formats that would enable incremental loading on other platforms

UCL Discovery

Space-optimized texture atlases

Author: Martínez Bayona Jonàs
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

Texture atlas parameterization provides an effective way to map a variety of colour and data attributes from 2D texture domains onto polygonal surface meshes. Most of the existing literature focus on how to build seamless texture atlases for continuous photometric detail, but little e ort has been devoted to devise e cient techniques for encoding self-repeating, uncontinuous signals such as building facades. We present a perception-based scheme for generating space-optimized texture atlases speci cally designed for intentionally non-bijective parameterizations. Our scheme combines within-chart tiling support with intelligent packing and perceptual measures for assigning texture space in accordance to the amount of information contents of the image and on its saliency. We demonstrate our optimization scheme in the context of real-time navigation through a gigatexel urban model of an European city. Our scheme achieves signi cant compression ratios and speed-up factors with visually indistinguishable results. We developed a technique that generates space-optimized texture atlases for the particular encoding of uncontinuous signals projected onto geometry. The scene is partitioned using a texture atlas tree that contains for each node a texture atlas. The leaf nodes of the tree contain scene geometry. The level of detail is controlled by traversing the tree and selecting the appropriate texture atlas for a given viewer position and orientation. In a preprocessing step, textures associated to each texture atlas node of the tree are packed. Textures are resized according to a given user-de ned texel size and the size of the geometry that are projected onto. We also use perceptual measures to assign texture space in accordance to image detail. We also explore different techniques for supporting texture wrapping of uncontinuous signals, which involved the development of e cient techniques for compressing texture coordinates via the GPU. Our approach supports texture ltering and DXTC compression without noticeable artifacts. We have implemented a prototype version of our space-optimized texture atlases technique and used it to render the 3D city model of Barcelona achieving interactive rendering frame rates. The whole model was composed by more than three million triangles and contained more than twenty thousand different textures representing the building facades with an average original resolution of 512 pixels per texture. Our scheme achieves up 100:1 compression ratios and speed-up factors of 20 with visually indistinguishable results

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tiled shading

Author: Assarsson Ulf
Olsson Ola
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2011
Field of study

Chalmers Research

University of Queensland eSpace

Data compression and transmission aspects of panoramic videos

Author: Chan SC
Ng KT
Shum HY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Panoramic videos are effective means for representing static or dynamic scenes along predefined paths. They allow users to change their viewpoints interactively at points in time or space defined by the paths. High-resolution panoramic videos, while desirable, consume a significant amount of storage and bandwidth for transmission. They also make real-time decoding computationally very intensive. This paper proposes efficient data compression and transmission techniques for panoramic videos. A high-performance MPEG-2-like compression algorithm, which takes into account the random access requirements and the redundancies of panoramic videos, is proposed. The transmission aspects of panoramic videos over cable networks, local area networks (LANs), and the Internet are also discussed. In particular, an efficient advanced delivery sharing scheme (ADSS) for reducing repeated transmission and retrieval of frequently requested video segments is introduced. This protocol was verified by constructing an experimental VOD system consisting of a video server and eight Pentium 4 computers. Using the synthetic panoramic video Village at a rate of 197 kb/s and 7 f/s, nearly two-thirds of the memory access and transmission bandwidth of the video server were saved under normal network traffic.published_or_final_versio

HKU Scholars Hub

Coprocessor integration for real-time event processing in particle physics detectors

Author: Badalov Alexey Pavlovich
Publication venue: Blanquerna - Universitat Ramon Llull
Publication date: 01/01/2016
Field of study

Els experiments de física d’altes energies actuals disposen d’acceleradors amb més energía, sensors més precisos i formes més flexibles de recopilar les dades. Aquesta ràpida evolució requereix de més capacitat de càlcul; els processadors massivament paral·lels, com ara les targes acceleradores gràfiques, ens posen a l’abast aquesta major capacitat de càlcul a un cost sensiblement inferior a les CPUs tradicionals. L’ús d’aquest tipus de processadors requereix, però, de nous algoritmes i nous enfocaments de l’organització de les dades que són difícils d’integrar en els programaris actuals. En aquest treball s’exploren els problemes derivats de l’ús d’algoritmes paral·lels en els entorns de programari existents, orientats a CPUs, i es proposa una solució, en forma de servei, que comunica amb els diversos pipelines que processen els esdeveniments procedents de les col·lisions de partícules, recull les dades en lots i els envia als algoritmes corrent sobre els processadors massivament paral·lels. Aquest servei s’integra en Gaudí - l’entorn de software de dos dels quatre experiments principals del Gran Col·lisionador d’Hadrons. S’examina el sobrecost que el servei afegeix als algoritmes paral·lels. S’estudia un cas d´ùs del servei per fer una reconstrucció paral·lela de les traces detectades en el VELO Pixel, el subdetector encarregat de la detecció de vèrtex en l’upgrade de LHCb. Per aquest cas, s’observen les característiques del rendiment en funció de la mida dels lots de dades. Finalment, les conclusions en posen en el context dels requeriments del sistema de trigger de LHCb.La física de altas energías dispone actualmente de aceleradores con energías mayores, sensores más precisos y métodos de recopilación de datos más flexibles que nunca. Su rápido progreso necesita aún más potencia de cálculo; el hardware masivamente paralelo, como las unidades de procesamiento gráfico, nos brinda esta potencia a un coste mucho más bajo que las CPUs tradicionales. Sin embargo, para usar eficientemente este hardware necesitamos algoritmos nuevos y nuevos enfoques de organización de datos difíciles de integrarse con el software existente. En este trabajo, se investiga cómo se pueden usar estos algoritmos paralelos en las infraestructuras de software ya existentes y que están orientadas a CPUs. Se propone una solución en forma de un servicio que comunica con los diversos pipelines que procesan los eventos de las correspondientes colisiones de particulas, reúne los datos en lotes y se los entrega a los algoritmos paralelos acelerados por hardware. Este servicio se integra con Gaudí — la infraestructura del entorno de software que usan dos de los cuatro gran experimentos del Gran Colisionador de Hadrones. Se examinan los costes añadidos por el servicio en los algoritmos paralelos. Se estudia un caso de uso del servicio para ejecutar un algoritmo paralelo para el VELO Pixel (el subdetector encargado de la localización de vértices en el upgrade del experimento LHCb) y se estudian las características de rendimiento de los distintos tamaños de lotes de datos. Finalmente, las conclusiones se contextualizan dentro la perspectiva de los requerimientos para el sistema de trigger de LHCb.High-energy physics experiments today have higher energies, more accurate sensors, and more flexible means of data collection than ever before. Their rapid progress requires ever more computational power; and massively parallel hardware, such as graphics cards, holds the promise to provide this power at a much lower cost than traditional CPUs. Yet, using this hardware requires new algorithms and new approaches to organizing data that can be difficult to integrate with existing software. In this work, I explore the problem of using parallel algorithms within existing CPU-orientated frameworks and propose a compromise between the different trade-offs. The solution is a service that communicates with multiple event-processing pipelines, gathers data into batches, and submits them to hardware-accelerated parallel algorithms. I integrate this service with Gaudi — a framework underlying the software environments of two of the four major experiments at the Large Hadron Collider. I examine the overhead the service adds to parallel algorithms. I perform a case study of using the service to run a parallel track reconstruction algorithm for the LHCb experiment's prospective VELO Pixel subdetector and look at the performance characteristics of using different data batch sizes. Finally, I put the findings into perspective within the context of the LHCb trigger's requirements

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

GPU-optimized approaches to molecular docking-based virtual screening in drug discovery: A comparative analysis

Author: Accordi Gianmarco
Beccari Andrea R.
Bisson Mauro
Fatica Massimiliano
Ficarelli Federico
Gadioli Davide
Palermo Gianluca
Vitali Emanuele
Publication venue
Publication date: 01/01/2024
Field of study

Finding a novel drug is a very long and complex procedure. Using computer simulations, it is possible to accelerate the preliminary phases by performing a virtual screening that filters a large set of drug candidates to a manageable number. This paper presents the implementations and comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. This work focuses on the analysis of parallel computation patterns and their mapping onto the target architecture. The first method adopts a traditional approach that spreads the computation for a single molecule across the entire GPU. The second uses a novel batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel. Experimental results showed a different behavior depending on the size of the database to be screened, either reaching a performance plateau sooner or having a more extended initial transient period to achieve a higher throughput (up to 5x), which is more suitable for extreme-scale virtual screening campaigns

Archivio istituzionale della ricerca - Politecnico di Milano

High performance bioinformatics and computational biology on general-purpose graphics processing units

Author: Ling Cheng
Publication venue: The University of Edinburgh
Publication date: 25/06/2012
Field of study

Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology

Edinburgh Research Archive