14,595 research outputs found
Parallel Computing on a PC Cluster
The tremendous advance in computer technology in the past decade has made it
possible to achieve the performance of a supercomputer on a very small budget.
We have built a multi-CPU cluster of Pentium PC capable of parallel
computations using the Message Passing Interface (MPI). We will discuss the
configuration, performance, and application of the cluster to our work in
physics.Comment: 3 pages, uses Latex and aipproc.cl
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
Graphic-Card Cluster for Astrophysics (GraCCA) -- Performance Tests
In this paper, we describe the architecture and performance of the GraCCA
system, a Graphic-Card Cluster for Astrophysics simulations. It consists of 16
nodes, with each node equipped with 2 modern graphic cards, the NVIDIA GeForce
8800 GTX. This computing cluster provides a theoretical performance of 16.2
TFLOPS. To demonstrate its performance in astrophysics computation, we have
implemented a parallel direct N-body simulation program with shared time-step
algorithm in this system. Our system achieves a measured performance of 7.1
TFLOPS and a parallel efficiency of 90% for simulating a globular cluster of
1024K particles. In comparing with the GRAPE-6A cluster at RIT (Rochester
Institute of Technology), the GraCCA system achieves a more than twice higher
measured speed and an even higher performance-per-dollar ratio. Moreover, our
system can handle up to 320M particles and can serve as a general-purpose
computing cluster for a wide range of astrophysics problems.Comment: Accepted for publication in New Astronom
- …