45 research outputs found

    Multiple dataset visualization (MDV) framework for scalar volume data

    Get PDF
    Many applications require comparative analysis of multiple datasets representing different samples, conditions, time instants, or views in order to develop a better understanding of the scientific problem/system under consideration. One effective approach for such analysis is visualization of the data. In this PhD thesis, we propose an innovative multiple dataset visualization (MDV) approach in which two or more datasets of a given type are rendered concurrently in the same visualization. MDV is an important concept for the cases where it is not possible to make an inference based on one dataset, and comparisons between many datasets are required to reveal cross-correlations among them. The proposed MDV framework, which deals with some fundamental issues that arise when several datasets are visualized together, follows a multithreaded architecture consisting of three core components, data preparation/loading, visualization and rendering. The visualization module - the major focus of this study, currently deals with isosurface extraction and texture-based rendering techniques. For isosurface extraction, our all-in-memory approach keeps datasets under consideration and the corresponding geometric data in the memory. Alternatively, the only-polygons- or points-in-memory only keeps the geometric data in memory. To address the issues related to storage and computation, we develop adaptive data coherency and multiresolution schemes. The inter-dataset coherency scheme exploits the similarities among datasets to approximate the portions of isosurfaces of datasets using the isosurface of one or more reference datasets whereas the intra/inter-dataset multiresolution scheme processes the selected portions of each data volume at varying levels of resolution. The graphics hardware-accelerated approaches adopted for MDV include volume clipping, isosurface extraction and volume rendering, which use 3D textures and advanced per fragment operations. With appropriate user-defined threshold criteria, we find that various MDV techniques maintain a linear time-N relationship, improve the geometry generation and rendering time, and increase the maximum N that can be handled (N: number of datasets). Finally, we justify the effectiveness and usefulness of the proposed MDV by visualizing 3D scalar data (representing electron density distributions in magnesium oxide and magnesium silicate) from parallel quantum mechanical simulation

    Doctor of Philosophy

    Get PDF
    dissertationDataflow pipeline models are widely used in visualization systems. Despite recent advancements in parallel architecture, most systems still support only a single CPU or a small collection of CPUs such as a SMP workstation. Even for systems that are specifically tuned towards parallel visualization, their execution models only provide support for data-parallelism while ignoring taskparallelism and pipeline-parallelism. With the recent popularization of machines equipped with multicore CPUs and multi-GPU units, these visualization systems are undoubtedly falling further behind in reaching maximum efficiency. On the other hand, there exist several libraries that can schedule program executions on multiple CPUs and/or multiple GPUs. However, due to differences in executing a task graph and a pipeline along with their APIs being considerably low-level, it still remains a challenge to integrate these run-time libraries into current visualization systems. Thus, there is a need for a redesigned dataflow architecture to fully support and exploit the power of highly parallel machines in large-scale visualization. The new design must be able to schedule executions on heterogeneous platforms while at the same time supporting arbitrarily large datasets through the use of streaming data structures. The primary goal of this dissertation work is to develop a parallel dataflow architecture for streaming large-scale visualizations. The framework includes supports for platforms ranging from multicore processors to clusters consisting of thousands CPUs and GPUs. We achieve this in our system by introducing the notion of Virtual Processing Elements and Task-Oriented Modules along with a highly customizable scheduler that controls the assignment of tasks to elements dynamically. This creates an intuitive way to maintain multiple CPU/GPU kernels yet still provide coherency and synchronization across module executions. We have implemented these techniques into HyperFlow which is made of an API with all basic dataflow constructs described in the dissertation, and a distributed run-time library that can be used to deploy those pipelines on multicore, multi-GPU and cluster-based platforms

    VtkSMP: Task-based Parallel Operators for VTK Filters

    Get PDF
    International audienceNUMA nodes are potentially powerful but taking benefit of their capabilities is challenging due to their architec- ture (multiple computing cores, advanced memory hierarchy). They are nonetheless one of the key components to enable processing the ever growing amount of data produced by scientific simulations. In this paper we study the parallelization of patterns commonly used in VTK algorithms and propose a new multi- threaded plugin for VTK that eases the development of parallel multi-core VTK filters. We specifically focus on task-based approaches and show that with a limited code refactoring effort we can take advantage of NUMA node capabilities. We experiment our patterns on a transform filter, base isosurface extraction filter and a min/max tree accelerated isosurface extraction. We support 3 programming environments, OpenMP, Intel TBB and X-KAAPI, and propose different algorithmic refinements according to the capabilities of the target environment. Results show that we can speed execution up to 30 times on a 48-core machine

    Doctor of Philosophy

    Get PDF
    dissertationRay tracing presents an efficient rendering algorithm for scientific visualization using common visualization tools and scales with increasingly large geometry counts while allowing for accurate physically-based visualization and analysis, which enables enhanced rendering and new visualization techniques. Interactivity is of great importance for data exploration and analysis in order to gain insight into large-scale data. Increasingly large data sizes are pushing the limits of brute-force rasterization algorithms present in the most widely-used visualization software. Interactive ray tracing presents an alternative rendering solution which scales well on multicore shared memory machines and multinode distributed systems while scaling with increasing geometry counts through logarithmic acceleration structure traversals. Ray tracing within existing tools also provides enhanced rendering options over current implementations, giving users additional insight from better depth cues while also enabling publication-quality rendering and new models of visualization such as replicating photographic visualization techniques

    A Fast Fluid Simulator Using Smoothed-Particle Hydrodynamics

    Get PDF
    abstract: This document presents a new implementation of the Smoothed Particles Hydrodynamics algorithm using DirectX 11 and DirectCompute. The main goal of this document is to present to the reader an alternative solution to the largely studied and researched problem of fluid simulation. Most other solutions have been implemented using the NVIDIA CUDA framework; however, the proposed solution in this document uses the Microsoft general-purpose computing on graphics processing units API. The implementation allows for the simulation of a large number of particles in a real-time scenario. The solution presented here uses the Smoothed Particles Hydrodynamics algorithm to calculate the forces within the fluid; this algorithm provides a Lagrangian approach for discretizes the Navier-Stockes equations into a set of particles. Our solution uses the DirectCompute compute shaders to evaluate each particle using the multithreading and multi-core capabilities of the GPU increasing the overall performance. The solution then describes a method for extracting the fluid surface using the Marching Cubes method and the programmable interfaces exposed by the DirectX pipeline. Particularly, this document presents a method for using the Geometry Shader Stage to generate the triangle mesh as defined by the Marching Cubes method. The implementation results show the ability to simulate over 64K particles at a rate of 900 and 400 frames per second, not including the surface reconstruction steps and including the Marching Cubes steps respectively.Dissertation/ThesisM.S. Computer Science 201

    New Algorithmic Techniques for Large Scale Volumetric Data Visualization on Parallel Architectures

    Get PDF
    Volume visualization is widely used as an effective approach for the visual exploration, computational analysis, and manipulation of volumetric datasets. Due to the dramatic advances in imaging instruments and computing technologies, such datasets are now appearing at a very fast rate with increasingly larger sizes in many engineering, science and medical applications. Isosurface and direct volume rendering(DVR) are two of the most widely used techniques to render such datasets. This dissertation introduces novel techniques for rendering isosurfaces and volumes, and extends these techniques to multiprocessor architectures. We first focus on cluster-based techniques for isosurface extraction and rendering using polygonal approximation. We present a new simple indexing scheme and data layout approach, which enable scalable and efficient isosurface generation. This algorithm is the first known parallel algorithm to achieve provable load balancing on multiprocessor systems. We also develop an algorithm to generate isosurfaces using ray-casting on multi-core processors. Our method is based on a hybrid strategy that begins with an object order traversal of the data followed by ray-casting on ordered sets of an adaptive number of subcubes, one set for each small group of pixels on the image. We develop a multithreaded implementation, which uses new dynamic load balancing techniques that start with an image partitioning for the initial stage and then perform dynamic allocation of groups of ray-casting tasks among the different threads. The strategy ensures almost equal loads among the cores while maintaining spatial data locality. This scheme is extended to perform direct volume rendering and is shown to achieve similar improvements in terms of overall performance, load balancing, and scalability. We conduct a large number of tests for all our algorithms on the University of Maryland Visualization Cluster and on the 8-core Clovertown platform using a wide variety of datasets such as Richtmyer-Meshkov Instability dataset (7.5GB for each time step) and Visible Human dataset (~1GB). We obtain results that consistently validate the efficiency and the scalability of our algorithms. In particular, the overall performance of our hybrid ray-casting scheme achieves an interactive rendering rate on high resolution (1024x1024) screens for all the datasets tested

    Locality Enhancement and Dynamic Optimizations on Multi-Core and GPU

    Get PDF
    Enhancing the match between software executions and hardware features is key to computing efficiency. The match is a continuously evolving and challenging problem. This dissertation focuses on the development of programming system support for exploiting two key features of modern hardware development: the massive parallelism of emerging computational accelerators such as Graphic Processing Units (GPU), and the non-uniformity of cache sharing in modern multicore processors. They are respectively driven by the important role of accelerators in today\u27s general-purpose computing and the ultimate importance of memory performance. This dissertation particularly concentrates on optimizing control flows and memory references, at both compilation and execution time, to tap into the full potential of pure software solutions in taking advantage of the two key hardware features.;Conditional branches cause divergences in program control flows, which may result in serious performance degradation on massively data-parallel GPU architectures with Single Instruction Multiple Data (SIMD) parallelism. On such an architecture, control divergence may force computing units to stay idle for a substantial time, throttling system throughput by orders of magnitude. This dissertation provides an extensive exploration of the solution to this problem and presents program level transformations based upon two fundamental techniques --- thread relocation and data relocation. These two optimizations provide fundamental support for swapping jobs among threads so that the control flow paths of threads converge within every SIMD thread group.;In memory performance, this dissertation concentrates on two aspects: the influence of nonuniform sharing on multithreading applications, and the optimization of irregular memory references on GPUs. In shared cache multicore chips, interactions among threads are complicated due to the interplay of cache contention and synergistic prefetching. This dissertation presents the first systematic study on the influence of non-uniform shared cache on contemporary parallel programs, reveals the mismatch between the software development and underlying cache sharing hierarchies, and further demonstrates it by proposing and applying cache-sharing-aware data transformations that bring significant performance improvement. For the second aspect, the efficiency of GPU accelerators is sensitive to irregular memory references, which refer to the memory references whose access patterns remain unknown until execution time (e.g., A[P[i]]). The root causes of the irregular memory reference problem are similar to that of the control flow problem, while in a more general and complex form. I developed a framework, named G-Streamline, as a unified software solution to dynamic irregularities in GPU computing. It treats both types of irregularities at the same time in a holistic fashion, maximizing the whole-program performance by resolving conflicts among optimizations

    GPU acceleration for evolutionary topology optimization of continuum structures using isosurfaces

    Get PDF
    Evolutionary topology optimization of three-dimensional continuum structures is a computationally demanding task in terms of memory consumption and processing time. This work aims to alleviate these constraints proposing a well-suited strategy for Graphics Processing Unit (GPU) computing. Such a proposal adopts a fine-grained GPU instance of matrix-free iterative solver for structural analysis and an efficient GPU implementation for isosurface extraction and volume fraction calculation. The performance of the solving stage is evaluated using two preconditioning techniques, including the comparison with the sparse-matrix CPU implementation. The proposal is evaluated using topology optimization problems for real-world applications.We gratefully acknowledge the support of NVIDIA Corporation with the donation of some of the GPUs used for this research. Such a work has also been supported by the research support programmes of Ministry of Economy and Competitiveness under the contract DPI2016-77538-R and \Fundación Séneca Agencia de Ciencia y Tecnología de la Región de Murcia" under the contract 19274/PI/14
    corecore