Search CORE

380 research outputs found

Feed-forward volume rendering algorithm for moderately parallel MIMD machines

Author: Yagel Roni
Publication venue
Publication date
Field of study

Algorithms for direct volume rendering on parallel and vector processors are investigated. Volumes are transformed efficiently on parallel processors by dividing the data into slices and beams of voxels. Equal sized sets of slices along one axis are distributed to processors. Parallelism is achieved at two levels. Because each slice can be transformed independently of others, processors transform their assigned slices with no communication, thus providing maximum possible parallelism at the first level. Within each slice, consecutive beams are incrementally transformed using coherency in the transformation computation. Also, coherency across slices can be exploited to further enhance performance. This coherency yields the second level of parallelism through the use of the vector processing or pipelining. Other ongoing efforts include investigations into image reconstruction techniques, load balancing strategies, and improving performance

NASA Technical Reports Server

FFT for the APE Parallel Computer

Author: Davies C. T. H.
Federico Toschi
Katz G.
Klaus Schilling
Lippert Th.
Raffaele Tripiccione
Sven Trentmann
Thomas Lippert
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1997
Field of study

We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

CERN Document Server

NASA high performance computing and communications program

Author: Holcomb Lee
Hunter Paul
Smith Paul
Publication venue
Publication date
Field of study

The National Aeronautics and Space Administration's HPCC program is part of a new Presidential initiative aimed at producing a 1000-fold increase in supercomputing speed and a 100-fold improvement in available communications capability by 1997. As more advanced technologies are developed under the HPCC program, they will be used to solve NASA's 'Grand Challenge' problems, which include improving the design and simulation of advanced aerospace vehicles, allowing people at remote locations to communicate more effectively and share information, increasing scientist's abilities to model the Earth's climate and forecast global environmental trends, and improving the development of advanced spacecraft. NASA's HPCC program is organized into three projects which are unique to the agency's mission: the Computational Aerosciences (CAS) project, the Earth and Space Sciences (ESS) project, and the Remote Exploration and Experimentation (REE) project. An additional project, the Basic Research and Human Resources (BRHR) project exists to promote long term research in computer science and engineering and to increase the pool of trained personnel in a variety of scientific disciplines. This document presents an overview of the objectives and organization of these projects as well as summaries of individual research and development programs within each project

NASA Technical Reports Server

An Application Perspective on High-Performance Computing and Communications

Author: Fox Geoffrey C.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1996
Field of study

We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas -- computational chemistry; Monte Carlo methods from physics to economics; manufacturing; and computational fluid dynamics; command and control; or crisis management; and multimedia services to client computers and settop boxes. The hardware varies from tightly-coupled parallel supercomputers to heterogeneous distributed systems. The software models span HPF and data parallelism, to distributed information systems and object/data flow parallelism on the Web. We find that in each case, it is reasonably clear that HPCC works in principle, and postulate that this knowledge can be used in a new generation of software infrastructure based on the WebWindows approach, and discussed in an accompanying paper

Syracuse University Research Facility and Collaborative Environment

Parzsweep: A Novel Parallel Algorithm for Volume Rendering of Regular Datasets

Author: Ramswamy Lakshmy
Publication venue: Scholars Junction
Publication date: 21/04/2003
Field of study

The sweep paradigm for volume rendering has previously been successfully applied with irregular grids. This thesis describes a parallel volume rendering algorithm called PARZSweep for regular grids that utilizes the sweep paradigm. The sweep paradigm is a concept where a plane sweeps the data volume parallel to the viewing direction. As the sweeping proceeds in the increasing order of z, the faces incident on the vertices are projected onto the viewing volume to constitute to the image. The sweeping ensures that all faces are projected in the correct order and the image thus obtained is very accurate in its details. PARZSweep is an extension of a serial algorithm for regular grids called RZSweep. The hypothesis of this research is that a parallel version of RZSweep can be designed and implemented which will utilize multiple processors to reduce rendering times. PARZSweep follows an approach called image-based task scheduling or tiling. This approach divides the image space into tiles and allocates each tile to a processor for individual rendering. The sub images are composite to form a complete final image. PARZSweep uses a shared memory architecture in order to take advantage of inherent cache coherency for faster communication between processor. Experiments were conducted comparing RZSweep and PARZSweep with respect to prerendering times, rendering times and image quality. RZSweep and PARZSweep have approximately the same prerendering costs, produce exactly the same images and PARZSweep substantially reduced rendering times. PARZSweep was evaluated for scalability with respect to the number of tiles and number of processors. Scalability results were disappointing due to uneven data distribution

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Data distributed, parallel algorithm for ray-traced volume rendering

Author: Hansen Charles D.
Ma Kwan-Liu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1993
Field of study

Journal ArticleThis paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local raytracing of their sub volume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

Towards a Scalable Architecture for Real-Time Volume Rendering

Author: Kaufman Arie
Pfister Hanspeter
Wessels Frank
Publication venue: Eurographics Association
Publication date: 30/06/2010
Field of study

In this paper we present our research eff orts towards a scalable volume rendering architecture for the real-time visualization of dynamically changing high-resolution datasets. Using a linearly skewed memory interleaving we were able to develop a parallel data ow model that leads to local, fixed-bandwidth interconnections between processing elements. This parallel dataflow model diff ers from previous work in that it requires no global communication of data except at the pixel level. Using this data ow model we are developing Cube-4, an architecture that is scalable to very high performances and allows for modular and extensible hardware implementations.Engineering and Applied Science

Harvard University - DASH

Journal of Real-Time Image Processing manuscript No. (will be inserted by the editor) Evaluation of real-time LBP computing in multiple architectures

Author: A. Nieto
López Alej
Miguel Bordallo
Publication venue
Publication date
Field of study

Abstract Local Binary Pattern (LBP) is a texture operator that is used in several different computer vision applications requiring, in many cases, real-time operation in multiple computing platforms. The irruption of new video standards has increased the typical resolutions and frame rates, which need considerable computational performance. Since LBP is essentially a pixel operator that scales with image size, typical straightforward implementations are usually insufficient to meet these requirements. To identify the solutions that maximize the performance of the real-time LBP extraction, we compare a series different implementations in terms of computational performance and energy efficiency while analyzing the different optimizations that can be made to reach real-time performance on multiple platforms and their different available computing resources. Our contribution addresses the extensive survey of LBP implementations in different platforms that can be found in the literature. To provide for a more complete evaluation, we have implemented the LBP algorithms in several platforms such as Graphics Processing Units, mobile processors and a hybrid programming model image coprocessor. We have extended the evaluation of some of the solutions that can be found in previous work. In addition, we publish the source code of our implementations

CiteSeerX