67,854 research outputs found
Software-based fault-tolerant routing algorithm in multidimensional networks
Massively parallel computing systems are being built with hundreds or thousands of components such as nodes, links, memories, and connectors. The failure of a component in such systems will not only reduce the computational power but also alter the network's topology. The software-based fault-tolerant routing algorithm is a popular routing to achieve fault-tolerance capability in networks. This algorithm is initially proposed only for two dimensional networks (Suh et al., 2000). Since, higher dimensional networks have been widely employed in many contemporary massively parallel systems; this paper proposes an approach to extend this routing scheme to these indispensable higher dimensional networks. Deadlock and livelock freedom and the performance of presented algorithm, have been investigated for networks with different dimensionality and various fault regions. Furthermore, performance results have been presented through simulation experiments
Massively Parallel Ray Tracing Algorithm Using GPU
Ray tracing is a technique for generating an image by tracing the path of
light through pixels in an image plane and simulating the effects of
high-quality global illumination at a heavy computational cost. Because of the
high computation complexity, it can't reach the requirement of real-time
rendering. The emergence of many-core architectures, makes it possible to
reduce significantly the running time of ray tracing algorithm by employing the
powerful ability of floating point computation. In this paper, a new GPU
implementation and optimization of the ray tracing to accelerate the rendering
process is presented
The factorization of large composite numbers on the MPP
The continued fraction method for factoring large integers (CFRAC) was an ideal algorithm to be implemented on a massively parallel computer such as the Massively Parallel Processor (MPP). After much effort, the first 60 digit number was factored on the MPP using about 6 1/2 hours of array time. Although this result added about 10 digits to the size number that could be factored using CFRAC on a serial machine, it was already badly beaten by the implementation of Davis and Holdridge on the CRAY-1 using the quadratic sieve, an algorithm which is clearly superior to CFRAC for large numbers. An algorithm is illustrated which is ideally suited to the single instruction multiple data (SIMD) massively parallel architecture and some of the modifications which were needed in order to make the parallel implementation effective and efficient are described
FFT for the APE Parallel Computer
We present a parallel FFT algorithm for SIMD systems following the `Transpose
Algorithm' approach. The method is based on the assignment of the data field
onto a 1-dimensional ring of systolic cells. The systolic array can be
universally mapped onto any parallel system. In particular for systems with
next-neighbour connectivity our method has the potential to improve the
efficiency of matrix transposition by use of hyper-systolic communication. We
have realized a scalable parallel FFT on the APE100/Quadrics massively parallel
computer, where our implementation is part of a 2-dimensional hydrodynamics
code for turbulence studies. A possible generalization to 4-dimensional FFT is
presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include
A sweep algorithm for massively parallel simulation of circuit-switched networks
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude
A GPU-based hyperbolic SVD algorithm
A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm,
using a massively parallel graphics processing unit (GPU), is developed. The
algorithm also serves as the final stage of solving a symmetric indefinite
eigenvalue problem. Numerical testing demonstrates the gains in speed and
accuracy over sequential and MPI-parallelized variants of similar Jacobi-type
HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are
discussed.Comment: Accepted for publication in BIT Numerical Mathematic
GreeM : Massively Parallel TreePM Code for Large Cosmological N-body Simulations
In this paper, we describe the implementation and performance of GreeM, a
massively parallel TreePM code for large-scale cosmological N-body simulations.
GreeM uses a recursive multi-section algorithm for domain decomposition. The
size of the domains are adjusted so that the total calculation time of the
force becomes the same for all processes. The loss of performance due to
non-optimal load balancing is around 4%, even for more than 10^3 CPU cores.
GreeM runs efficiently on PC clusters and massively-parallel computers such as
a Cray XT4. The measured calculation speed on Cray XT4 is 5 \times 10^4
particles per second per CPU core, for the case of an opening angle of
\theta=0.5, if the number of particles per CPU core is larger than 10^6.Comment: 13 pages, 11 figures, accepted by PAS
- …
