380 research outputs found
Feed-forward volume rendering algorithm for moderately parallel MIMD machines
Algorithms for direct volume rendering on parallel and vector processors are investigated. Volumes are transformed efficiently on parallel processors by dividing the data into slices and beams of voxels. Equal sized sets of slices along one axis are distributed to processors. Parallelism is achieved at two levels. Because each slice can be transformed independently of others, processors transform their assigned slices with no communication, thus providing maximum possible parallelism at the first level. Within each slice, consecutive beams are incrementally transformed using coherency in the transformation computation. Also, coherency across slices can be exploited to further enhance performance. This coherency yields the second level of parallelism through the use of the vector processing or pipelining. Other ongoing efforts include investigations into image reconstruction techniques, load balancing strategies, and improving performance
FFT for the APE Parallel Computer
We present a parallel FFT algorithm for SIMD systems following the `Transpose
Algorithm' approach. The method is based on the assignment of the data field
onto a 1-dimensional ring of systolic cells. The systolic array can be
universally mapped onto any parallel system. In particular for systems with
next-neighbour connectivity our method has the potential to improve the
efficiency of matrix transposition by use of hyper-systolic communication. We
have realized a scalable parallel FFT on the APE100/Quadrics massively parallel
computer, where our implementation is part of a 2-dimensional hydrodynamics
code for turbulence studies. A possible generalization to 4-dimensional FFT is
presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include
NASA high performance computing and communications program
The National Aeronautics and Space Administration's HPCC program is part of a new Presidential initiative aimed at producing a 1000-fold increase in supercomputing speed and a 100-fold improvement in available communications capability by 1997. As more advanced technologies are developed under the HPCC program, they will be used to solve NASA's 'Grand Challenge' problems, which include improving the design and simulation of advanced aerospace vehicles, allowing people at remote locations to communicate more effectively and share information, increasing scientist's abilities to model the Earth's climate and forecast global environmental trends, and improving the development of advanced spacecraft. NASA's HPCC program is organized into three projects which are unique to the agency's mission: the Computational Aerosciences (CAS) project, the Earth and Space Sciences (ESS) project, and the Remote Exploration and Experimentation (REE) project. An additional project, the Basic Research and Human Resources (BRHR) project exists to promote long term research in computer science and engineering and to increase the pool of trained personnel in a variety of scientific disciplines. This document presents an overview of the objectives and organization of these projects as well as summaries of individual research and development programs within each project
An Application Perspective on High-Performance Computing and Communications
We review possible and probable industrial applications of HPCC focusing on the software and hardware issues. Thirty-three separate categories are illustrated by detailed descriptions of five areas -- computational chemistry; Monte Carlo methods from physics to economics; manufacturing; and computational fluid dynamics; command and control; or crisis management; and multimedia services to client computers and settop boxes. The hardware varies from tightly-coupled parallel supercomputers to heterogeneous distributed systems. The software models span HPF and data parallelism, to distributed information systems and object/data flow parallelism on the Web. We find that in each case, it is reasonably clear that HPCC works in principle, and postulate that this knowledge can be used in a new generation of software infrastructure based on the WebWindows approach, and discussed in an accompanying paper
Parzsweep: A Novel Parallel Algorithm for Volume Rendering of Regular Datasets
The sweep paradigm for volume rendering has previously been successfully applied with irregular grids. This thesis describes a parallel volume rendering algorithm called PARZSweep for regular grids that utilizes the sweep paradigm. The sweep paradigm is a concept where a plane sweeps the data volume parallel to the viewing direction. As the sweeping proceeds in the increasing order of z, the faces incident on the vertices are projected onto the viewing volume to constitute to the image. The sweeping ensures that all faces are projected in the correct order and the image thus obtained is very accurate in its details. PARZSweep is an extension of a serial algorithm for regular grids called RZSweep. The hypothesis of this research is that a parallel version of RZSweep can be designed and implemented which will utilize multiple processors to reduce rendering times. PARZSweep follows an approach called image-based task scheduling or tiling. This approach divides the image space into tiles and allocates each tile to a processor for individual rendering. The sub images are composite to form a complete final image. PARZSweep uses a shared memory architecture in order to take advantage of inherent cache coherency for faster communication between processor. Experiments were conducted comparing RZSweep and PARZSweep with respect to prerendering times, rendering times and image quality. RZSweep and PARZSweep have approximately the same prerendering costs, produce exactly the same images and PARZSweep substantially reduced rendering times. PARZSweep was evaluated for scalability with respect to the number of tiles and number of processors. Scalability results were disappointing due to uneven data distribution
Data distributed, parallel algorithm for ray-traced volume rendering
Journal ArticleThis paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local raytracing of their sub volume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method
Recommended from our members
Towards a Scalable Architecture for Real-Time Volume Rendering
In this paper we present our research eff orts towards a
scalable volume rendering architecture for the real-time
visualization of dynamically changing high-resolution
datasets. Using a linearly skewed memory interleaving we were able to develop a parallel data
ow model
that leads to local, fixed-bandwidth interconnections between processing elements. This parallel dataflow model
diff ers from previous work in that it requires no global
communication of data except at the pixel level. Using this data
ow model we are developing Cube-4, an
architecture that is scalable to very high performances
and allows for modular and extensible hardware implementations.Engineering and Applied Science
Journal of Real-Time Image Processing manuscript No. (will be inserted by the editor) Evaluation of real-time LBP computing in multiple architectures
Abstract Local Binary Pattern (LBP) is a texture operator that is used in several different computer vision applications requiring, in many cases, real-time operation in multiple computing platforms. The irruption of new video standards has increased the typical resolutions and frame rates, which need considerable computational performance. Since LBP is essentially a pixel operator that scales with image size, typical straightforward implementations are usually insufficient to meet these requirements. To identify the solutions that maximize the performance of the real-time LBP extraction, we compare a series different implementations in terms of computational performance and energy efficiency while analyzing the different optimizations that can be made to reach real-time performance on multiple platforms and their different available computing resources. Our contribution addresses the extensive survey of LBP implementations in different platforms that can be found in the literature. To provide for a more complete evaluation, we have implemented the LBP algorithms in several platforms such as Graphics Processing Units, mobile processors and a hybrid programming model image coprocessor. We have extended the evaluation of some of the solutions that can be found in previous work. In addition, we publish the source code of our implementations
- …