364 research outputs found

    A study of the communication cost of the FFT on torus multicomputers

    Get PDF
    The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

    A Parallel implementation of an mpeg-2 encoder using message-passing

    Get PDF
    The days of film are waning as digital cameras and digital video cameras are becoming commonplace. Uncompressed digital video can consume large amounts of space, making it cumbersome to store efficiently. A method of video compression was developed by the Motion Pictures Expert Group (MPEG), and is now an international standard with the International Organization for Standardization (ISO). This thesis deals with the MPEG-2 Video standard, ISO/IEC 13818-2 [2]. The goal of this thesis is to explore the applications of MPEG-2 encoding in a parallel processing paradigm. To achieve this, a sequential MPEG-2 software encoder was obtained from the MPEG Software Simulation Group (MSSG) [18] and modified to be run, in parallel, on a cluster of single-processor Linux workstations using the Message Passing Interface (MPI) [11, 10, 3]. A multi-threaded pipeline of the encoding process was created using Pthreads [6]. The resulting pipelined parallel encoder has been shown to produce compliant elementary MPEG-2 bitstreams for progressive video sequences. Results of simulation showed that the parallel encoder always performed better than the sequential version as the number of processors scaled. However, it did not exhibit the ideal linear speedup that all parallel programs aim to achieve. This is due to the program executing on a set of resources not ideal for the multi-threaded pipeline. The ensuing chapters will provide the motivation for this work, and an overview of MPEG in addition to parallel processing and programming. Also forthcoming will be how it was achieved and the results produced. Supplementary applications of this work will also be discussed

    Abstract State Machines 1988-1998: Commented ASM Bibliography

    Get PDF
    An annotated bibliography of papers which deal with or use Abstract State Machines (ASMs), as of January 1998.Comment: Also maintained as a BibTeX file at http://www.eecs.umich.edu/gasm

    Parallel Implementation of Systolic Array Design for Developing Medical Image Rotation

    Get PDF
    Many image-processing algorithms are particularly suited to parallel computing, as they process images that are difficult and time consuming to analyse. In particular, medical images of tissues tend to be very complex with great irregularity and variability in shapes. Furthermore, existing algorithms contain explicit parallelism, which can be efficiently exploited by processing arrays. A good example of an image processing operation is the geometric rotation of a rectangular bitmap. This paper presents a set of systolic array designs for implementing the geometric rotation algorithms of images on VLSI processing arrays. The examined algorithm performs a trigonometric transformation on each pixel in an image.  The design is implemented as a distributed computing system of networked computers using Parallel Virtual Machine (PVM) model. Each node (computer) in the network takes part in the task in hand – such as image processing – using message passing. Comments and conclusions about the implementation of the design as a distributed computing system are discussed. Keywords: parallel computing, distributed computing. PVM, image rotation, systolic array

    The Tera Multithreaded Architecture and Unstructured Meshes

    Get PDF
    The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures

    Fractal Image Compression on MIMD Architectures II: Classification Based Speed-up Methods

    Get PDF
    Since fractal image compression is computationally very expensive, speed-up techniques are required in addition to parallel processing in order to compress large images in reasonable time. In this paper we discuss parallel fractal image compression algorithms suited for MIMD architectures which employ block classification as speed-up method

    A Survey on Parallel Architecture and Parallel Programming Languages and Tools

    Get PDF
    In this paper, we have presented a brief review on the evolution of parallel computing to multi - core architecture. The survey briefs more than 45 languages, libraries and tools used till date to increase performance through parallel programming. We ha ve given emphasis more on the architecture of parallel system in the survey

    Characterizing Computation-Communication Overlap in Message-Passing Systems

    Full text link

    Introduction to CAP : A language extension for the specification of pipelined parallel applications

    Get PDF
    Programming parallel shared- and distributed-memory architectures remains a difficult task. This contribution proposes a methodology for the hierarchical specification of pipelined parallel applications running on shared- as well as distributed-memory architecture. The methodology targets coarse to medium grain parallelism. The CAP methodology (Computer-Aided Parallelization) assumes that parallel hardware works as a factory producing cars. The important part of the analogy is the support for pipelining. Another important feature of the CAP methodology is its hierarchical and compositional nature. The methodology is supported by the CAP language extension to C++. The CAP extension translates to sequential C++ programs for application validation using conventional debuggers, to shared-memory parallel programs based on threads, and to distributed-memory parallel programs communicating using the PVM message-passing library. This contribution presents the CAP methodology, the CAP language extension, as well as an application of the CAP methodology to medical imaging. It also presents the current status of the CAP project
    • …
    corecore