3,551 research outputs found

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators

    Parallel volume rendering for large scientific data

    Get PDF
    Data sets of immense size are regularly generated by large scale computing resources. Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualized on standard workstations is now commonplace. One solution to this problem is to employ a \u27visualization cluster,\u27 a small to medium scale cluster dedicated to performing visualization and analysis of massive data sets generated on larger scale supercomputers. These clusters are designed to fulfill a different need than traditional supercomputers, and therefore their design mandates different hardware choices, such as increased memory, and more recently, graphics processing units (GPUs). While there has been much previous work on distributed memory visualization as well as GPU visualization, there is a relative dearth of algorithms which effectively use GPUs at a large scale in a distributed memory environment. In this work, we study a common visualization technique in a GPU-accelerated, distributed memory setting, and present performance characteristics when scaling to extremely large data sets

    Interactive Visualization on High-Resolution Tiled Display Walls with Network Accessible Compute- and Display-Resources

    Get PDF
    Papers number 2-7 and appendix B and C of this thesis are not available in Munin: 2. Hagen, T-M.S., Johnsen, E.S., Stødle, D., Bjorndalen, J.M. and Anshus, O.: 'Liberating the Desktop', First International Conference on Advances in Computer-Human Interaction (2008), pp 89-94. Available at http://dx.doi.org/10.1109/ACHI.2008.20 3. Tor-Magne Stien Hagen, Oleg Jakobsen, Phuong Hoai Ha, and Otto J. Anshus: 'Comparing the Performance of Multiple Single-Cores versus a Single Multi-Core' (manuscript)4. Tor-Magne Stien Hagen, Phuong Hoai Ha, and Otto J. Anshus: 'Experimental Fault-Tolerant Synchronization for Reliable Computation on Graphics Processors' (manuscript) 5. Tor-Magne Stien Hagen, Daniel Stødle and Otto J. Anshus: 'On-Demand High-Performance Visualization of Spatial Data on High-Resolution Tiled Display Walls', Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory and Applications (2010), pages 112-119. Available at http://dx.doi.org/10.5220/0002849601120119 6. Bård Fjukstad, Tor-Magne Stien Hagen, Daniel Stødle, Phuong Hoai Ha, John Markus Bjørndalen and Otto Anshus: 'Interactive Weather Simulation and Visualization on a Display Wall with Many-Core Compute Nodes', Para 2010 – State of the Art in Scientific and Parallel Computing. Available at http://vefir.hi.is/para10/extab/para10-paper-60 7. Tor-Magne Stien Hagen, Daniel Stødle, John Markus Bjørndalen, and Otto Anshus: 'A Step towards Making Local and Remote Desktop Applications Interoperable with High-Resolution Tiled Display Walls', Lecture Notes in Computer Science (2011), Volume 6723/2011, 194-207. Available at http://dx.doi.org/10.1007/978-3-642-21387-8_15The vast volume of scientific data produced today requires tools that can enable scientists to explore large amounts of data to extract meaningful information. One such tool is interactive visualization. The amount of data that can be simultaneously visualized on a computer display is proportional to the display’s resolution. While computer systems in general have seen a remarkable increase in performance the last decades, display resolution has not evolved at the same rate. Increased resolution can be provided by tiling several displays in a grid. A system comprised of multiple displays tiled in such a grid is referred to as a display wall. Display walls provide orders of magnitude more resolution than typical desktop displays, and can provide insight into problems not possible to visualize on desktop displays. However, their distributed and parallel architecture creates several challenges for designing systems that can support interactive visualization. One challenge is compatibility issues with existing software designed for personal desktop computers. Another set of challenges include identifying characteristics of visualization systems that can: (i) Maintain synchronous state and display-output when executed over multiple display nodes; (ii) scale to multiple display nodes without being limited by shared interconnect bottlenecks; (iii) utilize additional computational resources such as desktop computers, clusters and supercomputers for workload distribution; and (iv) use data from local and remote compute- and data-resources with interactive performance. This dissertation presents Network Accessible Compute (NAC) resources and Network Accessible Display (NAD) resources for interactive visualization of data on displays ranging from laptops to high-resolution tiled display walls. A NAD is a display having functionality that enables usage over a network connection. A NAC is a computational resource that can produce content for network accessible displays. A system consisting of NACs and NADs is either push-based (NACs provide NADs with content) or pull-based (NADs request content from NACs). To attack the compatibility challenge, a push-based system was developed. The system enables several simultaneous users to mirror multiple regions from the desktop of their computers (NACs) onto nearby NADs (among others a 22 megapixel display wall) without requiring usage of separate DVI/VGA cables, permanent installation of third party software or opening firewall ports. The system has lower performance than that of a DVI/VGA cable approach, but increases flexibility such as the possibility to share network accessible displays from multiple computers. At a resolution of 800 by 600 pixels, the system can mirror dynamic content between a NAC and a NAD at 38.6 frames per second (FPS). At 1600x1200 pixels, the refresh rate is 12.85 FPS. The bottleneck of the system is frame buffer capturing and encoding/decoding of pixels. These two functional parts are executed in sequence, limiting the usage of additional CPU cores. By pipelining and executing these parts on separate CPU cores, higher frame rates can be expected and by a factor of two in the best case. To attack all presented challenges, a pull-based system, WallScope, was developed. WallScope enables interactive visualization of local and remote data sets on high-resolution tiled display walls. The WallScope architecture comprises a compute-side and a display-side. The compute-side comprises a set of static and dynamic NACs. Static NACs are considered permanent to the system once added. This type of NAC typically has strict underlying security and access policies. Examples of such NACs are clusters, grids and supercomputers. Dynamic NACs are compute resources that can register on-the-fly to become compute nodes in the system. Examples of this type of NAC are laptops and desktop computers. The display-side comprises of a set of NADs and a data set containing data customized for the particular application domain of the NADs. NADs are based on a sort-first rendering approach where a visualization client is executed on each display-node. The state of these visualization clients is provided by a separate state server, enabling central control of load and refresh-rate. Based on the state received from the state server, the visualization clients request content from the data set. The data set is live in that it translates these requests into compute messages and forwards them to available NACs. Results of the computations are returned to the NADs for the final rendering. The live data set is close to the NADs, both in terms of bandwidth and latency, to enable interactive visualization. WallScope can visualize the Earth, gigapixel images, and other data available through the live data set. When visualizing the Earth on a 28-node display wall by combining the Blue Marble data set with the Landsat data set using a set of static NACs, the bottleneck of WallScope is the computation involved in combining the data sets. However, the time used to combine data sets on the NACs decreases by a factor of 23 when going from 1 to 26 compute nodes. The display-side can decode 414.2 megapixels of images per second (19 frames per second) when visualizing the Earth. The decoding process is multi-threaded and higher frame rates are expected using multi-core CPUs. WallScope can rasterize a 350-page PDF document into 550 megapixels of image-tiles and display these image-tiles on a 28-node display wall in 74.66 seconds (PNG) and 20.66 seconds (JPG) using a single quad-core desktop computer as a dynamic NAC. This time is reduced to 4.20 seconds (PNG) and 2.40 seconds (JPG) using 28 quad-core NACs. This shows that the application output from personal desktop computers can be decoupled from the resolution of the local desktop and display for usage on high-resolution tiled display walls. It also shows that the performance can be increased by adding computational resources giving a resulting speedup of 17.77 (PNG) and 8.59 (JPG) using 28 compute nodes. Three principles are formulated based on the concepts and systems researched and developed: (i) Establishing the end-to-end principle through customization, is a principle stating that the setup and interaction between a display-side and a compute-side in a visualization context can be performed by customizing one or both sides; (ii) Personal Computer (PC) – Personal Compute Resource (PCR) duality states that a user’s computer is both a PC and a PCR, implying that desktop applications can be utilized locally using attached interaction devices and display(s), or remotely by other visualization systems for domain specific production of data based on a user’s personal desktop install; and (iii) domain specific best-effort synchronization stating that for distributed visualization systems running on tiled display walls, state handling can be performed using a best-effort synchronization approach, where visualization clients eventually will get the correct state after a given period of time. Compared to state-of-the-art systems presented in the literature, the contributions of this dissertation enable utilization of a broader range of compute resources from a display wall, while at the same time providing better control over where to provide functionality and where to distribute workload between compute-nodes and display-nodes in a visualization context

    A Survey of Software Frameworks for Cluster-Based Large High-Resolution Displays

    Full text link

    Scalable Real-Time Rendering for Extremely Complex 3D Environments Using Multiple GPUs

    Get PDF
    In 3D visualization, real-time rendering of high-quality meshes in complex 3D environments is still one of the major challenges in computer graphics. New data acquisition techniques like 3D modeling and scanning have drastically increased the requirement for more complex models and the demand for higher display resolutions in recent years. Most of the existing acceleration techniques using a single GPU for rendering suffer from the limited GPU memory budget, the time-consuming sequential executions, and the finite display resolution. Recently, people have started building commodity workstations with multiple GPUs and multiple displays. As a result, more GPU memory is available across a distributed cluster of GPUs, more computational power is provided throughout the combination of multiple GPUs, and a higher display resolution can be achieved by connecting each GPU to a display monitor (resulting in a tiled large display configuration). However, using a multi-GPU workstation may not always give the desired rendering performance due to the imbalanced rendering workloads among GPUs and overheads caused by inter-GPU communication. In this dissertation, I contribute a multi-GPU multi-display parallel rendering approach for complex 3D environments. The approach has the capability to support a high-performance and high-quality rendering of static and dynamic 3D environments. A novel parallel load balancing algorithm is developed based on a screen partitioning strategy to dynamically balance the number of vertices and triangles rendered by each GPU. The overhead of inter-GPU communication is minimized by transferring only a small amount of image pixels rather than chunks of 3D primitives with a novel frame exchanging algorithm. The state-of-the-art parallel mesh simplification and GPU out-of-core techniques are integrated into the multi-GPU multi-display system to accelerate the rendering process

    Implementation of a sort-last volume rendering using 3D textures

    Get PDF
    La tesi consiste in una estensione della libreria grafica Aura (licenza gpl, sviluppata dalla Vrije Universiteit di Amsterdam) aggiungendo i componenti necessari ad effettuare un volume rendering distribuito secondo il paradigma sort-last ed usando le texture 3D. Il programma è stato testato su un cluster di 9 PC

    A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations

    Get PDF
    International audienceIn the context of scientific computing, the computational steering consists in the coupling of numerical simulations with 3D visualization systems through the network. This allows scientists to monitor online the intermediate results of their computations in a more interactive way than the batch mode, and allows them to modify the simulation parameters on-the-fly. While most of existing computational steering environments support parallel simulations, they are often limited to sequential visualization systems. This may lead to an important bottleneck and increased rendering time. To achieve the required performance for online visualization, we have designed the EPSN framework, a computational steering environment that enables to interconnect legacy parallel simulations with parallel visualization systems. For this, we have introduced a redistribution algorithm for unstructured data, that is well adapted to the context of M × N computational steering. Then, we focus on the design of our parallel viewer and present some experimental results obtained with a particle-based simulation in astrophysics

    Exploiting frame coherence in real-time rendering for energy-efficient GPUs

    Get PDF
    The computation capabilities of mobile GPUs have greatly evolved in the last generations, allowing real-time rendering of realistic scenes. However, the desire for processing complex environments clashes with the battery-operated nature of smartphones, for which users expect long operating times per charge and a low-enough temperature to comfortably hold them. Consequently, improving the energy-efficiency of mobile GPUs is paramount to fulfill both performance and low-power goals. The work of the processors from within the GPU and their accesses to off-chip memory are the main sources of energy consumption in graphics workloads. Yet most of this energy is spent in redundant computations, as the frame rate required to produce animations results in a sequence of extremely similar images. The goal of this thesis is to improve the energy-efficiency of mobile GPUs by designing micro-architectural mechanisms that leverage frame coherence in order to reduce the redundant computations and memory accesses inherent in graphics applications. First, we focus on reducing redundant color computations. Mobile GPUs typically employ an architecture called Tile-Based Rendering, in which the screen is divided into tiles that are independently rendered in on-chip buffers. It is common that more than 80% of the tiles produce exactly the same output between consecutive frames. We propose Rendering Elimination (RE), a mechanism that accurately determines such occurrences by computing and storing signatures of the inputs of all the tiles in a frame. If the signatures of a tile across consecutive frames are the same, the colors computed in the preceding frame are reused, saving all computations and memory accesses associated to the rendering of the tile. We show that RE vastly outperforms related schemes found in the literature, achieving a reduction of energy consumption of 37% and execution time of 33% with minimal overheads. Next, we focus on reducing redundant computations of fragments that will eventually not be visible. In real-time rendering, objects are processed in the order they are submitted to the GPU, which usually causes that the results of previously-computed objects are overwritten by new objects that turn occlude them. Consequently, whether or not a particular object will be occluded is not known until the entire scene has been processed. Based on the fact that visibility tends to remain constant across consecutive frames, we propose Early Visibility Resolution (EVR), a mechanism that predicts visibility based on information obtained in the preceding frame. EVR first computes and stores the depth of the farthest visible point after rendering each tile. Whenever a tile is rendered in the following frame, primitives that are farther from the observer than the stored depth are predicted to be occluded, and processed after the ones predicted to be visible. Additionally, this visibility prediction scheme is used to improve Rendering Elimination’s equal tile detection capabilities by not adding primitives predicted to be occluded in the signature. With minor hardware costs, EVR is shown to provide a reduction of energy consumption of 43% and execution time of 39%. Finally, we focus on reducing computations in tiles with low spatial frequencies. GPUs produce pixel colors by sampling triangles once per pixel and performing computations on each sampling location. However, most screen regions do not include sufficient detail to require high sampling rates, leading to a significant amount of energy wasted computing the same color for neighboring pixels. Given that spatial frequencies are maintained across frames, we propose Dynamic Sampling Rate, a mechanism that analyzes the spatial frequencies of tiles and determines the best sampling rate for them, which is applied in the following frame. Results show that Dynamic Sampling Rate significantly reduces processor activity, yielding energy savings of 40% and execution time reductions of 35%.La capacitat de càlcul de les GPU mòbils ha augmentat en gran mesura en les darreres generacions, permetent el renderitzat de paisatges complexos en temps real. Nogensmenys, el desig de processar escenes cada vegada més realistes xoca amb el fet que aquests dispositius funcionen amb bateries, i els usuaris n’esperen llargues durades i una temperatura prou baixa com per a ser agafats còmodament. En conseqüència, millorar l’eficiència energètica de les GPU mòbils és essencial per a aconseguir els objectius de rendiment i baix consum. Els processadors de la GPU i els seus accessos a memòria són els principals consumidors d’energia en càrregues gràfiques, però molt d’aquest consum és malbaratat en càlculs redundants, ja que les animacions produïdes s¿aconsegueixen renderitzant una seqüència d’imatges molt similars. L’objectiu d’aquesta tesi és millorar l’eficiència energètica de les GPU mòbils mitjançant el disseny de mecanismes microarquitectònics que aprofitin la coherència entre imatges per a reduir els càlculs i accessos redundants inherents a les aplicacions gràfiques. Primerament, ens centrem en reduir càlculs redundants de colors. A les GPU mòbils, sovint s'empra una arquitectura anomenada Tile-Based Rendering, en què la pantalla es divideix en regions que es processen independentment dins del xip. És habitual que més del 80% de les regions de pantalla produeixin els mateixos colors entre imatges consecutives. Proposem Rendering Elimination (RE), un mecanisme que determina acuradament aquests casos computant una signatura de les entrades de totes les regions. Si les signatures de dues imatges són iguals, es reutilitzen els colors calculats a la imatge anterior, el que estalvia tots els càlculs i accessos a memòria de la regió. RE supera àmpliament propostes relacionades de la literatura, aconseguint una reducció del consum energètic del 37% i del temps d’execució del 33%. Seguidament, ens centrem en reduir càlculs redundants en fragments que eventualment no seran visibles. En aplicacions gràfiques, els objectes es processen en l’ordre en què son enviats a la GPU, el que sovint causa que resultats ja processats siguin sobreescrits per nous objectes que els oclouen. Per tant, no se sap si un objecte serà visible o no fins que tota l’escena ha estat processada. Fonamentats en el fet que la visibilitat tendeix a ser constant entre imatges, proposem Early Visibility Resolution (EVR), un mecanisme que prediu la visibilitat basat en informació obtinguda a la imatge anterior. EVR computa i emmagatzema la profunditat del punt visible més llunyà després de processar cada regió de pantalla. Quan es processa una regió a la imatge següent, es prediu que les primitives més llunyanes a el punt guardat seran ocloses i es processen després de les que es prediuen que seran visibles. Addicionalment, aquest esquema de predicció s’empra en millorar la detecció de regions redundants de RE al no afegir les primitives que es prediu que seran ocloses a les signatures. Amb un cost de maquinari mínim, EVR aconsegueix una millora del consum energètic del 43% i del temps d’execució del 39%. Finalment, ens centrem a reduir càlculs en regions de pantalla amb poca freqüència espacial. Les GPU actuals produeixen colors mostrejant els triangles una vegada per cada píxel i fent càlculs a cada localització mostrejada. Però la majoria de regions no tenen suficient detall per a necessitar altes freqüències de mostreig, el que implica un malbaratament d’energia en el càlcul del mateix color en píxels adjacents. Com les freqüències tendeixen a mantenir-se en el temps, proposem Dynamic Sampling Rate (DSR)¸ un mecanisme que analitza les freqüències de les regions una vegada han estat renderitzades i en determina la menor freqüència de mostreig a la que es poden processar, que s’aplica a la següent imatge...Postprint (published version
    corecore