Search CORE

19,184 research outputs found

Recommended from our members

A practical WSI experimental programme

Author: Hedge SJ
Jalowiecki IP
Lea RM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1991
Field of study

At Brunel University, research has been underway for several years to assess the architectural, electrical and physical benefits and constraints of the WASP wafer-scale Associative String Processor (ASP). This is intended to implement a massively parallel processor entirely within the constraints of WSI. WASP 1 and WASP 2 were the technology demonstrators of the UK funded Alvey programme (starting 1984), researching fundamental design methodologies for WSI. They are both examples of the Associative String Processor (ASP) architecture, developed by Brunel University. Further demonstrators are currently funded by a 31/2-year US ONR IS&T programme (starting 1987), involving both further technology demonstration, applications research and fundamental packaging and manufacturing design issue

Brunel University Research Archive

Design and Implementation of a Massively Parallel Version of DIRECT

Author: He Jian
Sosonkina Masha
Verstak Alex
Watson Layne T.
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster

Computer Science Technical Reports @Virginia Tech

Performance Modeling and Analysis of a Massively Parallel DIRECT— Part 2

Author: He Jian
Sosonkina M.
Verstak Alex
Watson L. T.
Publication venue
Publication date: 01/01/2007
Field of study

Modeling and analysis techniques are used to investigate the performance of a massively parallel version of DIRECT, a global search algorithm widely used in multidisciplinary design optimization applications. Several highdimensional benchmark functions and real world problems are used to test the design effectiveness under various problem structures. In this second part of a twopart work, theoretical and experimental results are compared for two parallel clusters with different system scale and network connectivity. The first part studied performance sensitivity to important parameters for problem configurations and parallel schemes, using performance metrics such as memory usage, load balancing, and parallel efficiency. Here linear regression models are used to characterize two major overhead sources—interprocessor communication and processor idleness—and also applied to the isoefficiency functions in scalability analysis. For a variety of highdimensional problems and large scale systems, the massively parallel design has achieved reasonable performance. The results of the performance study provide guidance for efficient problem and scheme configuration. More importantly, the design considerations and analysis techniques generalize to the transformation of other global search algorithms into effective large scale parallel optimization tools

Computer Science Technical Reports @Virginia Tech

Stepwise transformation of algorithms into array processor architectures by the decomp

Author: Vehlies U.
Publication venue: New York, NY : Hindawi Publishing Corporation
Publication date: 01/01/1995
Field of study

A formal approach for the transformation of computation intensive digital signal processing algorithms into suitable array processor architectures is presented. It covers the complete design flow from algorithmic specifications in a high-level programming language to architecture descriptions in a hardware description language. The transformation itself is divided into manageable design steps and implemented in the CAD-tool DECOMP which allows the exploration of different architectures in a short time. With the presented approach data independent algorithms can be mapped onto array processor architectures. To allow this, a known mapping methodology for array processor design is extended to handle inhomogeneous dependence graphs with nonregular data dependences. The implementation of the formal approach in the DECOMP is an important step towards design automation for massively parallel systems

Directory of Open Access Journals

Institutionelles Repositorium der Leibniz Universität Hannover

Performance Modeling and Analysis of a Massively Parallel DIRECT— Part 1

Author: He Jian
Sosonkina M.
Verstak Alex
Watson L. T.
Publication venue
Publication date: 01/01/2007
Field of study

Modeling and analysis techniques are used to investigate the performance of a massively parallel version of DIRECT, a global search algorithm widely used in multidisciplinary design optimization applications. Several highdimensional benchmark functions and real world problems are used to test the design effectiveness under various problem structures. Theoretical and experimental results are compared for two parallel clusters with different system scale and network connectivity. The present work aims at studying the performance sensitivity to important parameters for problem configurations, parallel schemes, and system settings. The performance metrics include the memory usage, load balancing, parallel efficiency, and scalability. An analytical bounding model is constructed to measure the load balancing performance under different schemes. Additionally, linear regression models are used to characterize two major overhead sources—interprocessor communication and processor idleness, and also applied to the isoefficiency functions in scalability analysis. For a variety of highdimensional problems and large scale systems, the massively parallel design has achieved reasonable performance. The results of the performance study provide guidance for efficient problem and scheme configuration. More importantly, the generalized design considerations and analysis techniques are beneficial for transforming many global search algorithms to become effective large scale parallel optimization tools

Computer Science Technical Reports @Virginia Tech

Runtime volume visualization for parallel CFD

Author: Ma Kwan-Liu
Publication venue
Publication date
Field of study

This paper discusses some aspects of design of a data distributed, massively parallel volume rendering library for runtime visualization of parallel computational fluid dynamics simulations in a message-passing environment. Unlike the traditional scheme in which visualization is a postprocessing step, the rendering is done in place on each node processor. Computational scientists who run large-scale simulations on a massively parallel computer can thus perform interactive monitoring of their simulations. The current library provides an interface to handle volume data on rectilinear grids. The same design principles can be generalized to handle other types of grids. For demonstration, we run a parallel Navier-Stokes solver making use of this rendering library on the Intel Paragon XP/S. The interactive visual response achieved is found to be very useful. Performance studies show that the parallel rendering process is scalable with the size of the simulation as well as with the parallel computer

NASA Technical Reports Server

Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program. II: Wavelength Parallelization

Author: Baron E.
Baron E.
E. Baron
Peter H. Hauschildt
Schwarz G.
Publication venue: 'University of Chicago Press'
Publication date: 24/09/1997
Field of study

We describe an important addition to the parallel implementation of our generalized NLTE stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000--300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard MPI library calls and is fully portable between serial and parallel computers.Comment: AAS-TeX, 15 pages, full text with figures available at ftp://calvin.physast.uga.edu/pub/preprints/Wavelength-Parallel.ps.gz ApJ, in pres

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation

Author: Fukushige Toshiyuki
Koga Masaki
Makino Junichiro
Namura Ken
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/10/2003
Field of study

In this paper, we describe the architecture and performance of the GRAPE-6 system, a massively-parallel special-purpose computer for astrophysical

N

-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional systems, though it can be used for collisionless systems. The main differences between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6 force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed 3 clock cycles to calculate one interaction), (b) the clock speed is increased from 32 to 90 MHz, and (c) the total number of processor chips is increased from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops. We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.

arXiv.org e-Print Archive

Crossref

The language parallel Pascal and other aspects of the massively parallel processor

Author: Bruner J. D.
Reeves A. P.
Publication venue
Publication date
Field of study

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given

NASA Technical Reports Server

A Specialized Processor for Track Reconstruction at the LHC Crossing Rate

Author: Abba A.
Bedeschi F.
Caponio F.
Citterio M.
Cusimano A.
Geraci A.
Marino P.
Morello M. J.
Neri N.
Piucci A.
Punzi G.
Ristori L.
Spinella F.
Stracka S.
Tonelli D.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

We present the results of an R&D study of a specialized processor capable of precisely reconstructing events with hundreds of charged-particle tracks in pixel detectors at 40 MHz, thus suitable for processing LHC events at the full crossing frequency. For this purpose we design and test a massively parallel pattern-recognition algorithm, inspired by studies of the processing of visual images by the brain as it happens in nature. We find that high-quality tracking in large detectors is possible with sub-

\mu

s latencies when this algorithm is implemented in modern, high-speed, high-bandwidth FPGA devices. This opens a possibility of making track reconstruction happen transparently as part of the detector readout.Comment: Presented by G.Punzi at the conference on "Instrumentation for Colliding Beam Physics" (INSTR14), 24 Feb to 1 Mar 2014, Novosibirsk, Russia. Submitted to JINST proceeding

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio istituzionale della Ricerca - Scuola Normale Superiore

CERN Document Server