594 research outputs found
High performance Python for direct numerical simulations of turbulent flows
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an
invaluable research tool in fluid dynamics. Still, there are few publicly
available research codes and, due to the heavy number crunching implied,
available codes are usually written in low-level languages such as C/C++ or
Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS
code that nearly matches the performance of C++ for thousands of processors and
billions of unknowns. We also describe a version optimized through Cython, that
is found to match the speed of C++. The solvers are written from scratch in
Python, both the mesh, the MPI domain decomposition, and the temporal
integrators. The solvers have been verified and benchmarked on the Shaheen
supercomputer at the KAUST supercomputing laboratory, and we are able to show
very good scaling up to several thousand cores.
A very important part of the implementation is the mesh decomposition (we
implement both slab and pencil decompositions) and 3D parallel Fast Fourier
Transforms (FFT). The mesh decomposition and FFT routines have been implemented
in Python using serial FFT routines (either NumPy, pyFFTW or any other serial
FFT module), NumPy array manipulations and with MPI communications handled by
MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT
in Python for a slab mesh decomposition using 4 lines of compact Python code,
for which the parallel performance on Shaheen is found to be slightly better
than similar routines provided through the FFTW library. For a pencil mesh
decomposition 7 lines of code is required to execute a transform
Algorithms and programming tools for image processing on the MPP, part 2
A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites
The language parallel Pascal and other aspects of the massively parallel processor
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given
A multiple-SIMD architecture for image and tracking analysis
The computational requirements for real-time image based applications are such as to warrant the use of a parallel architecture. Commonly used parallel architectures conform to the classifications of Single Instruction Multiple Data (SIMD), or Multiple Instruction Multiple Data (MIMD). Each class of architecture has its advantages and dis-advantages. For example, SIMD architectures can be used on data-parallel problems, such as the processing of an image. Whereas MIMD architectures are more flexible and better suited to general purpose computing. Both types of processing are typically required for the analysis of the contents of an image.
This thesis describes a novel massively parallel heterogeneous architecture, implemented as the Warwick Pyramid Machine. Both SIMD and MIMD processor types are combined within this architecture. Furthermore, the SIMD array is partitioned, into smaller SIMD sub-arrays, forming a Multiple-SIMD array. Thus, local data parallel, global data parallel, and control parallel processing are supported.
After describing the present options available in the design of massively parallel machines and the nature of the image analysis problem, the architecture of the Warwick Pyramid Machine is described in some detail. The performance of this architecture is then analysed, both in terms of peak available computational power and in terms of representative applications in image analysis and numerical computation. Two tracking applications are also analysed to show the performance of this architecture. In addition, they illustrate the possible partitioning of applications between the SIMD and MIMD processor arrays.
Load-balancing techniques are then described which have the potential to increase the utilisation of the Warwick Pyramid Machine at run-time. These include mapping techniques for image regions across the Multiple-SIMD arrays, and for the compression of sparse data. It is envisaged that these techniques may be found useful in other parallel systems
Mapping Signal Processing Algorithms on Parallel Arcidtectures
Electrical Engineerin
Algorithms and programming tools for image processing on the MPP
Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems
- …