5,868 research outputs found
ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment
Applications in science and engineering often require huge computational
resources for solving problems within a reasonable time frame. Parallel
supercomputers provide the computational infrastructure for solving such
problems. A traditional application scheduler running on a parallel cluster
only supports static scheduling where the number of processors allocated to an
application remains fixed throughout the lifetime of execution of the job. Due
to the unpredictability in job arrival times and varying resource requirements,
static scheduling can result in idle system resources thereby decreasing the
overall system throughput. In this paper we present a prototype framework
called ReSHAPE, which supports dynamic resizing of parallel MPI applications
executed on distributed memory platforms. The framework includes a scheduler
that supports resizing of applications, an API to enable applications to
interact with the scheduler, and a library that makes resizing viable.
Applications executed using the ReSHAPE scheduler framework can expand to take
advantage of additional free processors or can shrink to accommodate a high
priority application, without getting suspended. In our research, we have
mainly focused on structured applications that have two-dimensional data arrays
distributed across a two-dimensional processor grid. The resize library
includes algorithms for processor selection and processor mapping. Experimental
results show that the ReSHAPE framework can improve individual job turn-around
time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference
on Parallel Processing (ICPP'07
A study of the communication cost of the FFT on torus multicomputers
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version
Hydra: A Parallel Adaptive Grid Code
We describe the first parallel implementation of an adaptive
particle-particle, particle-mesh code with smoothed particle hydrodynamics.
Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a
Cray proprietary language which allows rapid implementation of a serial code on
a parallel machine by allowing global addressing of distributed memory.
The collisionless variant of the code has already completed several 16.8
million particle cosmological simulations on a 128 processor Cray T3D whilst
the full hydrodynamic code has completed several 4.2 million particle combined
gas and dark matter runs. The efficiency of the code now allows parameter-space
explorations to be performed routinely using particles of each species.
A complete run including gas cooling, from high redshift to the present epoch
requires approximately 10 hours on 64 processors.
In this paper we present implementation details and results of the
performance and scalability of the CRAFT version of Hydra under varying degrees
of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure
Low Power Implementation of Non Power-of-Two FFTs on Coarse-Grain Reconfigurable Architectures
The DRM standard for digital radio broadcast in the AM band requires integrated devices for radio receivers at very low power. A System on Chip (SoC) call DiMITRI was developed based on a dual ARM9 RISC core architecture. Analyses showed that most computation power is used in the Coded Orthogonal Frequency Division Multiplexing (COFDM) demodulation to compute Fast Fourier Transforms (FFT) and inverse transforms (IFFT) on complex samples. These FFTs have to be computed on non power-of-two numbers of samples, which is very uncommon in the signal processing world. The results obtained with this chip, lead to the objective to decrease the power dissipated by the COFDM demodulation part using a coarse-grain reconfigurable structure as a coprocessor. This paper introduces two different coarse-grain architectures: PACT XPP technology and the Montium, developed by the University of Twente, and presents the implementation of a\ud
Fast Fourier Transform on 1920 complex samples. The implementation result on the Montium shows a saving of a factor 35 in terms of processing time, and 14 in terms of power consumption compared to the RISC implementation, and a\ud
smaller area. Then, as a conclusion, the paper presents the next steps of the development and some development issues
- …