116,429 research outputs found

    Parallel rendering

    Get PDF
    Journal ArticleMassively parallel computers have emerged as valuable tools for performing scientific and engineering computations, far outstripping the capabilities of independent workstations in both sheer floating point performance and memory capacity. As the resolution of simulation models increases, graphics algorithms that take advantage of the large memory and parallelism of these architectures are becoming increasingly important. This issue of IEEE Parallel & Distributed Technology highlights some recent work in parallel computer graphics, specifically parallel rendering

    Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU

    Full text link
    The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive general purpose applications. Very expensive GFLOPs and TFLOP performance has become very cheap with the GPGPUs. Current work focuses mainly on the highly parallel implementation of Matrix Exponentiation. Matrix Exponentiation is widely used in many areas of scientific community ranging from highly critical flight, CAD simulations to financial, statistical applications. Proposed solution for Matrix Exponentiation uses OpenCL for exploiting the hyper parallelism offered by the many core GPGPUs. It employs many general GPU optimizations and architectural specific optimizations. This experimentation covers the optimizations targeted specific to the Scientific Graphics cards (Tesla-C2050). Heterogeneous Highly Parallel Matrix Exponentiation method has been tested for matrices of different sizes and with different powers. The devised Kernel has shown 1000X speedup and 44 fold speedup with the naive GPU Kernel.Comment: 15 pages, 12 figures, International Journal of Distributed and Parallel systems (IJDPS) ISSN : 0976 - 9757 [Online] ; 2229 - 3957 [Print

    A CLIPS/X-window interface

    Get PDF
    The design and implementation of an interface between the C Language Integrated Production System (CLIPS) expert system development environment and the graphic user interface development tools of the X-Window system are described. The underlying basis of the CLIPS/X-Window is a client-server model in which multiple clients can attach to a single server that interprets, executes, and returns operation results, in response to client action requests. Implemented in an AIX (UNIX) operating system environment, the interface has been successfully applied in the development of graphics interfaces for production rule cooperating agents in a knowledge-based computer aided design (CAD) system. Initial findings suggest that the client-server model is particularly well suited to a distributed parallel processing operational mode in a networked workstation environment

    Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

    Full text link
    We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized hardware attached to a host computer, such as graphics processing units or many-core coprocessors. The scal- ability on the large number of cores is ensured by the lacks of synchro- nizations, locks and false-sharing during the main parallel step.Comment: 15 pages, 5 figure

    High speed design and analysis of cable-membrane structures on graphics cards

    Get PDF
    This paper discusses a new parallelization approach of the dynamic relaxation method, which is programmed with the NVIDIA CUDA API and executed on the graphics cards (GPU) of a computer. The main advantage of a GPU card is that it has a very large number of computing cores and a separate memory from the computer and they may reside inside a normal desktop computer. However due to architectural simplifications of the GPU systems, synchronization of cores is rather limited. This has a major effect on the parallelization, since the contribution of calculated values at the boundary nodes would require some form of synchronization. This limitation resulted in the new parallelization approach, where the nodes of the finite element mesh are distributed between the cores of the GPU and the elements are “duplicated”. The paper discusses the implementation details of this new parallel approach and some performance measurements of the new parallel dynamic relaxation method on GPU systems are also presented

    A Study of Speed of the Boundary Element Method as applied to the Realtime Computational Simulation of Biological Organs

    Full text link
    In this work, possibility of simulating biological organs in realtime using the Boundary Element Method (BEM) is investigated. Biological organs are assumed to follow linear elastostatic material behavior, and constant boundary element is the element type used. First, a Graphics Processing Unit (GPU) is used to speed up the BEM computations to achieve the realtime performance. Next, instead of the GPU, a computer cluster is used. Results indicate that BEM is fast enough to provide for realtime graphics if biological organs are assumed to follow linear elastostatic material behavior. Although the present work does not conduct any simulation using nonlinear material models, results from using the linear elastostatic material model imply that it would be difficult to obtain realtime performance if highly nonlinear material models that properly characterize biological organs are used. Although the use of BEM for the simulation of biological organs is not new, the results presented in the present study are not found elsewhere in the literature.Comment: preprint, draft, 2 tables, 47 references, 7 files, Codes that can solve three dimensional linear elastostatic problems using constant boundary elements (of triangular shape) while ignoring body forces are provided as supplementary files; codes are distributed under the MIT License in three versions: i) MATLAB version ii) Fortran 90 version (sequential code) iii) Fortran 90 version (parallel code

    Massively Parallel Algorithm for Solving the Eikonal Equation on Multiple Accelerator Platforms

    Get PDF
    The research presented in this thesis investigates parallel implementations of the Fast Sweeping Method (FSM) for Graphics Processing Unit (GPU)-based computational plat forms and proposes a new parallel algorithm for distributed computing platforms with accelerators. Hardware accelerators such as GPUs and co-processors have emerged as general- purpose processors in today’s high performance computing (HPC) platforms, thereby increasing platforms’ performance capabilities. This trend has allowed greater parallelism and substantial acceleration of scientific simulation software. In order to leverage the power of new HPC platforms, scientific applications must be written in specific lower-level programming languages, which used to be platform specific. Newer programming models such as OpenACC simplifies implementation and assures portability of applications to run across GPUs from different vendors and multi-core processors. The distance field is a representation of a surface geometry or shape required by many algorithms within the areas of computer graphics, visualization, computational fluid dynamics and more. It can be calculated by solving the eikonal equation using the FSM. The parallel FSMs explored in this thesis have not been implemented on GPU platforms and do not scale to a large problem size. This thesis addresses this problem by designing a parallel algorithm that utilizes a domain decomposition strategy for multi-accelerated distributed platforms. The proposed algorithm applies first coarse grain parallelism using MPI to distribute subdomains across multiple nodes and then fine grain parallelism to optimize performance by utilizing accelerators. The results of the parallel implementations of FSM for GPU-based platforms showed speedup greater than 20× compared to the serial version for some problems and the newly developed parallel algorithm eliminates the limitation of current algorithms to solve large memory problems with comparable runtime efficiency

    Parallel graphics and visualization

    Get PDF
    Computer graphics and visualization are very active fields of Computer Science, continuously producing new and exciting results. However, the demand for increasingly faster feedback together with the huge volume of data usually associated with these applications, result on growing computational requirements. An efficient utilization of a multiplicity of computational and visualization resources expedites data processing for image generation, thus enabling such requirements to be met. This special issue of Parallel Computing attends to a selection of six papers out of 21 published at the past 2006 Eurographics Symposium on Parallel Graphics and Visualization, which was held in May 2006 in Braga, Portugal. The Eurographics Symposium on Parallel Graphics and Visualization focuses on theoretical and applied research issues critical to parallel and distributed computing and its application to all aspects of computer graphics, virtual reality, scientific and engineering visualization. Parallel graphics and visualization has evolved dramatically in the last few years. While previous works focused on SIMD architectures and standard PC clusters, more recent research moved to large displays and visualization oriented cluster architectures, which include graphics processing units at each node. This trend can be observed on the papers selected for this special issue: two papers present results on realistic rendering on PC clusters, two papers focus on parallel volume rendering resorting to graphics processing units and two papers address large displays and visualization clusters. The paper by Chalmers et al. combines parallel processing on a cluster with visual perception to achieve high fidelity physically based selective rendering at close to interactive rates. Thomaszewski et al. also use a PC cluster to perform physically based simulations of cloth, modelling both the material properties and the interaction with the surrounding scene. Bernardon et al. exploit CPU and GPU parallelism to render volumes of unstructured grids with time varying data. Other volume rendering technique is presented by Müller et al. using a sort last approach to perform volume ray casting on the fragment shaders of a GPU cluster. Cotting et al. present a software genlock approach for Windows, compatible with off-the-shelf graphics hardware, which can be employed to build cost effective VR installations such as large tiled displays. Lorenz and Brunnett add a new functionality to Chromium, where a new point-to-multipoint connection based on UDP allows rendering of large scenes synchronously on an arbitrary number of tiled displays at nearby constant performance. We hope that this special issue provides an interesting overview into parallel graphics and visualization. Further interest in the topic can be satisfied by following the Symposia on Parallel Graphics and Visualization, the 2007 one taking place in Lugano, Switzerland
    corecore