13 research outputs found

    Parallel Rendering on Hybrid Multi-GPU Clusters

    Get PDF
    Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a NUMA architecture, where multiple processor sockets each have their locally attached memory and where auxiliary devices such as GPUs and network interfaces are directly attached to one of the processors. Such so called fat NUMA processing and graphics nodes are increasingly used to build cost-effective hybrid shared/distributed memory visualization clusters. In this paper we present a thorough analysis of the asynchronous parallelization of the rendering stages and we derive and implement important optimizations to achieve highly interactive framerates on such hybrid multi-GPU clusters. We use both a benchmark program and a real-world scientific application used to visualize, navigate and interact with simulations of cortical neuron circuit models

    A Process for Digitizing and Simulating Biologically Realistic Oligocellular Networks Demonstrated for the Neuro-Glio-Vascular Ensemble

    Get PDF
    One will not understand the brain without an integrated exploration of structure and function, these attributes being two sides of the same coin: together they form the currency of biological computation. Accordingly, biologically realistic models require the re-creation of the architecture of the cellular components in which biochemical reactions are contained. We describe here a process of reconstructing a functional oligocellular assembly that is responsible for energy supply management in the brain and creating a computational model of the associated biochemical and biophysical processes. The reactions that underwrite thought are both constrained by and take advantage of brain morphologies pertaining to neurons, astrocytes and the blood vessels that deliver oxygen, glucose and other nutrients. Each component of this neuro-glio-vasculature ensemble (NGV) carries-out delegated tasks, as the dynamics of this system provide for each cell-type its own energy requirements while including mechanisms that allow cooperative energy transfers. Our process for recreating the ultrastructure of cellular components and modeling the reactions that describe energy flow uses an amalgam of state-of the-art techniques, including digital reconstructions of electron micrographs, advanced data analysis tools, computational simulations and in silico visualization software. While we demonstrate this process with the NGV, it is equally well adapted to any cellular system for integrating multimodal cellular data in a coherent framework

    Direct send compositing for parallel sort-last rendering

    Full text link
    In contrast to sort-first, sort-last parallel rendering has the distinct advantage that the task division for parallel geometry processing and rasterization is simple, and can easily be incorporated into most visualization systems. However, the efficient final depth-compositing for polygonal data, or alpha-blending for volume data of partial rendering results is the key to achieve scalability in sort-last parallel rendering. In this paper, we demonstrate the efficiency as well as flexibility of the direct send sort-last compositing algorithm, and compare it to existing approaches, both in a theoretical analysis and in an experimental setting

    Fast compositing for cluster-parallel rendering

    Full text link
    The image compositing stages in cluster-parallel rendering for gathering and combining partial rendering results into a final display frame are fundamentally limited by node-to-node image throughput. Therefore, efficient image coding, compression and transmission must be considered to minimize that bottleneck. This paper studies the different performance limiting factors such as image representation, region-of-interest detection and fast image compression. Additionally, we show improved compositing performance using lossy YUV subsampling and we propose a novel fast region-of-interest detection algorithm that can improve in particular sort-last parallel rendering

    Equalizer: A scalable parallel rendering framework

    Full text link
    Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results

    Practical parallel rendering of detailed neuron simulations

    No full text
    Parallel rendering of large polygonal models with transparency is challenging due to the need for alpha-correct blending and compositing, which is costly for very large models with high depth complexity and spatial overlap. In this paper we compare the performance of raster-based rendering methods on mesh models of neurons using two applications, one of which is specifically tailored to the neuroscience application domain, the other a general purpose visualization tool with domain specific additions. The first implements both sort-first and sort-last and uses a scene graph style traversal to cull objects, and dual depth peeling for order independent transparency, whilst the other uses a simpler brute force data-parallel approach with sort last composition. The advantages and trade offs of these approaches are discussed. We present the optimized algorithms needed to achieve interactive frame rates for a non-trivial, real-world parallel rendering scenario. We show that a generic data visualization application can provide competitive performance when optimizing its rendering pipeline, with some loss of capability over an optimized domain-specific application

    Distributed Post-processing and Rendering for Large-Scale Scientific Simulations

    No full text
    With the ever-increasing capacity of high-performance computing (HPC) systems, computational simulation models become finer and more accurate. The size and complexity of the data produced, however, pose tremendous challenges for the visualization and analysis tasks. Explorative approaches, in particular, require the development of interactive human-computer interfaces using distributed and parallel post-processing architectures. Such infrastructures can also be used for the interaction with running simulations, with applications ranging from online monitoring to computational steering. Additionally, remote and parallel rendering can be integrated into the overall setup. This chapter gives an overview of current solutions and ongoing research activities in this domain