Search CORE

116,429 research outputs found

Parallel rendering

Author: Crockett Tom
Hansen Charles D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Journal ArticleMassively parallel computers have emerged as valuable tools for performing scientific and engineering computations, far outstripping the capabilities of independent workstations in both sheer floating point performance and memory capacity. As the resolution of simulation models increases, graphics algorithms that take advantage of the large memory and parallelism of these architectures are becoming increasingly important. This issue of IEEE Parallel & Distributed Technology highlights some recent work in parallel computer graphics, specifically parallel rendering

The University of Utah: J. Willard Marriott Digital Library

Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU

Author: Balasubramanian Srinivas
Raghavendra Prakash S
Raja Chittampally Vasanth
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 13/04/2012
Field of study

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive general purpose applications. Very expensive GFLOPs and TFLOP performance has become very cheap with the GPGPUs. Current work focuses mainly on the highly parallel implementation of Matrix Exponentiation. Matrix Exponentiation is widely used in many areas of scientific community ranging from highly critical flight, CAD simulations to financial, statistical applications. Proposed solution for Matrix Exponentiation uses OpenCL for exploiting the hyper parallelism offered by the many core GPGPUs. It employs many general GPU optimizations and architectural specific optimizations. This experimentation covers the optimizations targeted specific to the Scientific Graphics cards (Tesla-C2050). Heterogeneous Highly Parallel Matrix Exponentiation method has been tested for matrices of different sizes and with different powers. The devised Kernel has shown 1000X speedup and 44 fold speedup with the naive GPU Kernel.Comment: 15 pages, 12 figures, International Journal of Distributed and Parallel systems (IJDPS) ISSN : 0976 - 9757 [Online] ; 2229 - 3957 [Print

arXiv.org e-Print Archive

Crossref

A CLIPS/X-window interface

Author: Pohl Kym Jason
Publication venue
Publication date
Field of study

The design and implementation of an interface between the C Language Integrated Production System (CLIPS) expert system development environment and the graphic user interface development tools of the X-Window system are described. The underlying basis of the CLIPS/X-Window is a client-server model in which multiple clients can attach to a single server that interprets, executes, and returns operation results, in response to client action requests. Implemented in an AIX (UNIX) operating system environment, the interface has been successfully applied in the development of graphics interfaces for production rule cooperating agents in a knowledge-based computer aided design (CAD) system. Initial findings suggest that the client-server model is particularly well suited to a distributed parallel processing operational mode in a networked workstation environment

NASA Technical Reports Server

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

Author: C. Augonnet
E. Horowitz
F. Biscani
J. Reinders
M. Frigo
M. Gastineau
M. Gastineau
M. Monagan
M. Monagan
M. Monagan
P.S. Wang
R. Fateman
R.D. Blumofe
S.C. Johnson
Publication venue
Publication date: 01/01/2013
Field of study

We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized hardware attached to a host computer, such as graphics processing units or many-core coprocessors. The scal- ability on the large number of cores is ensured by the lacks of synchro- nizations, locks and false-sharing during the main parallel step.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

Crossref

HAL-INSU

HAL-OBSPM

High speed design and analysis of cable-membrane structures on graphics cards

Author: Iványi Péter
Publication venue: CIMNE
Publication date: 01/01/2017
Field of study

This paper discusses a new parallelization approach of the dynamic relaxation method, which is programmed with the NVIDIA CUDA API and executed on the graphics cards (GPU) of a computer. The main advantage of a GPU card is that it has a very large number of computing cores and a separate memory from the computer and they may reside inside a normal desktop computer. However due to architectural simpliﬁcations of the GPU systems, synchronization of cores is rather limited. This has a major eﬀect on the parallelization, since the contribution of calculated values at the boundary nodes would require some form of synchronization. This limitation resulted in the new parallelization approach, where the nodes of the ﬁnite element mesh are distributed between the cores of the GPU and the elements are “duplicated”. The paper discusses the implementation details of this new parallel approach and some performance measurements of the new parallel dynamic relaxation method on GPU systems are also presented

UPCommons. Portal del coneixement obert de la UPC

A Study of Speed of the Boundary Element Method as applied to the Realtime Computational Simulation of Biological Organs

Author: P Kirana Kumara
Publication venue
Publication date: 14/01/2014
Field of study

In this work, possibility of simulating biological organs in realtime using the Boundary Element Method (BEM) is investigated. Biological organs are assumed to follow linear elastostatic material behavior, and constant boundary element is the element type used. First, a Graphics Processing Unit (GPU) is used to speed up the BEM computations to achieve the realtime performance. Next, instead of the GPU, a computer cluster is used. Results indicate that BEM is fast enough to provide for realtime graphics if biological organs are assumed to follow linear elastostatic material behavior. Although the present work does not conduct any simulation using nonlinear material models, results from using the linear elastostatic material model imply that it would be difficult to obtain realtime performance if highly nonlinear material models that properly characterize biological organs are used. Although the use of BEM for the simulation of biological organs is not new, the results presented in the present study are not found elsewhere in the literature.Comment: preprint, draft, 2 tables, 47 references, 7 files, Codes that can solve three dimensional linear elastostatic problems using constant boundary elements (of triangular shape) while ignoring body forces are provided as supplementary files; codes are distributed under the MIT License in three versions: i) MATLAB version ii) Fortran 90 version (sequential code) iii) Fortran 90 version (parallel code

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

New Jersey History (NJH - E-Journal)

Massively Parallel Algorithm for Solving the Eikonal Equation on Multiple Accelerator Platforms

Author: Shrestha Anup
Publication venue: 'IUScholarWorks'
Publication date: 01/12/2016
Field of study

The research presented in this thesis investigates parallel implementations of the Fast Sweeping Method (FSM) for Graphics Processing Unit (GPU)-based computational plat forms and proposes a new parallel algorithm for distributed computing platforms with accelerators. Hardware accelerators such as GPUs and co-processors have emerged as general- purpose processors in today’s high performance computing (HPC) platforms, thereby increasing platforms’ performance capabilities. This trend has allowed greater parallelism and substantial acceleration of scientific simulation software. In order to leverage the power of new HPC platforms, scientific applications must be written in specific lower-level programming languages, which used to be platform specific. Newer programming models such as OpenACC simplifies implementation and assures portability of applications to run across GPUs from different vendors and multi-core processors. The distance field is a representation of a surface geometry or shape required by many algorithms within the areas of computer graphics, visualization, computational fluid dynamics and more. It can be calculated by solving the eikonal equation using the FSM. The parallel FSMs explored in this thesis have not been implemented on GPU platforms and do not scale to a large problem size. This thesis addresses this problem by designing a parallel algorithm that utilizes a domain decomposition strategy for multi-accelerated distributed platforms. The proposed algorithm applies first coarse grain parallelism using MPI to distribute subdomains across multiple nodes and then fine grain parallelism to optimize performance by utilizing accelerators. The results of the parallel implementations of FSM for GPU-based platforms showed speedup greater than 20× compared to the serial version for some problems and the newly developed parallel algorithm eliminates the limitation of current algorithms to solve large memory problems with comparable runtime efficiency

Boise State University - ScholarWorks

Parallel graphics and visualization

Author: Heirich Alan
Raffin Bruno
Santos Luís Paulo
Publication venue: 'Elsevier BV'
Publication date: 01/06/2007
Field of study

Computer graphics and visualization are very active fields of Computer Science, continuously producing new and exciting results. However, the demand for increasingly faster feedback together with the huge volume of data usually associated with these applications, result on growing computational requirements. An efficient utilization of a multiplicity of computational and visualization resources expedites data processing for image generation, thus enabling such requirements to be met. This special issue of Parallel Computing attends to a selection of six papers out of 21 published at the past 2006 Eurographics Symposium on Parallel Graphics and Visualization, which was held in May 2006 in Braga, Portugal. The Eurographics Symposium on Parallel Graphics and Visualization focuses on theoretical and applied research issues critical to parallel and distributed computing and its application to all aspects of computer graphics, virtual reality, scientific and engineering visualization. Parallel graphics and visualization has evolved dramatically in the last few years. While previous works focused on SIMD architectures and standard PC clusters, more recent research moved to large displays and visualization oriented cluster architectures, which include graphics processing units at each node. This trend can be observed on the papers selected for this special issue: two papers present results on realistic rendering on PC clusters, two papers focus on parallel volume rendering resorting to graphics processing units and two papers address large displays and visualization clusters. The paper by Chalmers et al. combines parallel processing on a cluster with visual perception to achieve high fidelity physically based selective rendering at close to interactive rates. Thomaszewski et al. also use a PC cluster to perform physically based simulations of cloth, modelling both the material properties and the interaction with the surrounding scene. Bernardon et al. exploit CPU and GPU parallelism to render volumes of unstructured grids with time varying data. Other volume rendering technique is presented by Müller et al. using a sort last approach to perform volume ray casting on the fragment shaders of a GPU cluster. Cotting et al. present a software genlock approach for Windows, compatible with off-the-shelf graphics hardware, which can be employed to build cost effective VR installations such as large tiled displays. Lorenz and Brunnett add a new functionality to Chromium, where a new point-to-multipoint connection based on UDP allows rendering of large scenes synchronously on an arbitrary number of tiled displays at nearby constant performance. We hope that this special issue provides an interesting overview into parallel graphics and visualization. Further interest in the topic can be satisfied by following the Symposia on Parallel Graphics and Visualization, the 2007 one taking place in Lugano, Switzerland

Universidade do Minho: RepositoriUM