50 research outputs found

    Introduction to CAP : A language extension for the specification of pipelined parallel applications

    Get PDF
    Programming parallel shared- and distributed-memory architectures remains a difficult task. This contribution proposes a methodology for the hierarchical specification of pipelined parallel applications running on shared- as well as distributed-memory architecture. The methodology targets coarse to medium grain parallelism. The CAP methodology (Computer-Aided Parallelization) assumes that parallel hardware works as a factory producing cars. The important part of the analogy is the support for pipelining. Another important feature of the CAP methodology is its hierarchical and compositional nature. The methodology is supported by the CAP language extension to C++. The CAP extension translates to sequential C++ programs for application validation using conventional debuggers, to shared-memory parallel programs based on threads, and to distributed-memory parallel programs communicating using the PVM message-passing library. This contribution presents the CAP methodology, the CAP language extension, as well as an application of the CAP methodology to medical imaging. It also presents the current status of the CAP project

    Program Parallelization with CAP : A tutorial

    Get PDF
    Tutorial introduction to the CAP (Computer-Aided Parallelization) Language

    Computer-aided synthesis of parallel image processing applications

    Get PDF
    We present a tutorial description of the CAP computer-aided parallelization tool. CAP has been designed with the goal of letting the parallel application programmer have complete control of how his application is parallelized, and at the same time freeing him from the burden of managing explicitly a large number of threads and associated synchronization and communication primitives. The CAP tool, a precompiler generating C++ source code, enables application programmers to specify at a high level of abstraction the set of threads present in the application, the processing operations offered by these threads, and the parallel constructs specifying the flow of data and parameters between operations. A configuration map specifies the mapping between CAP threads and operating system processes, possibly located on different computers. The generated program may run on various parallel configurations without recompilation. We discuss the issues of flow control and load balancing and show the solutions offered by CAP. We also show how CAP can be used to generate relatively complex parallel programs incorporating neighborhood dependent operations. Finally, we briefly describe a real 3D image processing application: the Visible Human Slice Server, its implementation according to the previously defined concepts and its performanc

    Comparing multimedia storage architectures

    Get PDF
    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfil these requirements, we use a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through simulation the real-time behavior of two multiprocessor multi-disk architectures: GigaView and the Unix workstation cluster. GigaView incorporates point-to-point communication between processing units and the workstation cluster supports communication through a shared bus-and-memory architecture. For a standard multimedia server architecture consisting of 8 disks and 4 disk-node processors, we evaluate stream frame access times under various parameters such as load factors, frame size, stream throughput and synchronicity requirements. We compare the behavior of GigaView and the workstation cluster in terms of delay and delay jitte

    Multimedia performance behavior of the GigaView parallel image server

    Get PDF
    Multimedia interfaces increase the need for large image databases, supporting the capability of storing and fetching streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, the GigaView parallel image server architecture relies on arrays of intelligent disk nodes, with each disk node being composed of one processor and one disk. This paper analyzes, through simulation, the real-time behavior of the GigaView in terms of delay and delay jitter. For a high-end GigaView architecture, consisting of 16 disks and T9000 transputers, we evaluate stream frame access times under various parameters, such as load factors, frame size, stream throughput, and synchronicity requirement

    Dynamic load balancing of parallel cellular automata

    Get PDF
    We are interested in running in parallel cellular automata. We present an algorithm which explores the dynamic remapping of cells in order to balance the load between processing nodes. The parallel application runs on a cluster of PCs connected by Fast-Ethernet. A general cellular automaton can be described as a set of cells where each cell is a state machine. To compute the next cell state, each cell needs some information from neighbouring cells. There are no limitations on the kind of information exchanged nor on the computation itself. Only the automaton topology defining the neighbours of each cell remains unchanged during the automaton's life. As a typical example of cellular automation we consider the image skeletonization problem. Skeletonization requires spatial filtering to be repetitively applied to the image. Each step erodes a thin part of the original image. After the last step, only the image skeleton remains. Skeletonization algorithms require vast amounts of computing power, especially when applied to large images. Therefore, skeletonization application can potentially benefit from the use of parallel processing. Two different parallel algorithms are proposed, one with a static load distribution consisting in splitting the cells over several processing nodes and the other with a dynamic load balancing scheme capable of remapping cells during the program execution. Performance measurements show that the cell migration doesn't reduce the speedup if the program is already load balanced. It greatly improves the performance if the parallel application is not well balance

    Performances of the PS<sup>2</sup> parallel storage and processing system for tomographic image visualization

    Get PDF
    We propose a new approach for developing parallel I/O- and compute-intensive applications. At a high level of abstraction, a macro data flow description describes how processing and disk access operations are combined. This high-level description (CAP) is precompiled into compilable and executable C++ source language. Parallel file system components specified by CAP are offered as reusable CAP operations. Low-level parallel file system components can, thanks to the CAP formalism, be combined with processing operations in order to yield efficient pipelined parallel I/O and compute intensive programs. The underlying parallel system is based on commodity components (PentiumPro processors, Fast Ethernet) and runs on top of WindowsNT. The CAP-based parallel program development approach is applied to the development of an I/O and processing intensive tomographic 3D image visualization application. Configurations range from a single PentiumPro I-disk system to a four PentiumPro 27-disk system. We show that performances scale well when increasing the number of processors and disks. With the largest configuration, the system is able to extract in parallel and project into the display space between three and four 512&times;512 images per second. The images may have any orientation and are extracted from a 100 MByte 3D tomographic image striped over the available set of disk

    Synthesizing parallel imaging applications using the CAP Computer-Aided Parallelization tool

    Get PDF
    Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task. In order to facilitate the development of parallel applications, we propose the CAP computer aided parallelization tool which enables application programmers to specify at a high level of abstraction the flow of data between pipelined parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. The paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. The paper's contribution is: (1) to show how such implementations can be compactly specified in CAP; and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurement

    Giga view parallel image server performance analysis

    Get PDF
    Professionals in various fields such as medical imaging, biology and civil engineering require rapid access to huge amounts of uncompressed pixmap image data. Multi-media interfaces further increase the need for large image databases. In order to fulfill these requirements, the GigaView parallel image server architecture relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one disk. This contribution analyzes through simulation and experimentation the behavior of the GigaView under single and multiple requests, and compares it to the behavior of RAID servers. It evaluates image visualization window access times under various parameters such as load factors and the number of cooperating disk nodes. Under single request, the GigaView image server can be modeled as a single high-throughput low-latency secondary storage device. Under multiple requests, the notions of utilization and maximum sustainable throughput define accurately the behavior of the GigaView
    corecore