84 research outputs found

    The Fortran parallel transformer and its programming environment

    Get PDF

    JPT: A Java Parallelization Tool

    Full text link

    Evaluating the performance of software distributed shared memory as a target for parallelizing compilers

    Get PDF
    In this paper we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, hand-coded software DSM and hand-coded message passing. For this comparison, we use six applications: four that are regular and two that are irregular: Our results are gathered on an 8-node IBM SP/2 using the TreadMarks software DSM system. We use the APR shared-memory (SPF) compiler to generate the shared memory-programs and the APR XHPF compiler to generate message passing programs. The hand-coded message passing programs run with the IBM PVMe optimized message passing library. On the regular programs, both the compiler-generated and the hand-coded message passing outperform the SPF/TreadMarks combination: the compiler-generated message passing by 5.5% to 40%, and the hand-coded message passing by 7.5% to 49%. On the irregular programs, the SPF/TreadMarks combination outperforms the compiler-generated message passing by 38% and 89%, and only slightly underperforms the hand-coded message passing, differing by 4.4% and 16%. We also identify the factors that account for the performance differences, estimate their relative importance, and describe methods to improve the performanc

    Visual programming and debugging for parallel computing

    Full text link

    Tareador: a tool to unveil parallelization strategies at undergraduate level

    Get PDF
    This paper presents a methodology and framework designed to assist students in the process of finding appropriate task decomposition strategies for their sequential program, as well as identifying bottlenecks in the later execution of the parallel program. One of the main components of this framework is Tareador, which provides a simple API to specify potential task decomposition strategies for a sequential program. Once the student proposes how to break the sequential code into tasks, Tareador 1) provides information about the dependences between tasks that should be honored when implementing that task decomposition using a parallel programming model; and 2) estimates the potential parallelism that could be achieved in an ideal parallel architecture with infinite processors; and 3) sim- ulates the parallel execution on an ideal architecture estimating the potential speed–up that could be achieved on a number of processors. The pedagogical style of the methodology is currently applied to teach parallelism in a third-year compulsory subject in the Bachelor Degree in Informatics Engineering at the Barcelona School of Informatics of the Universitat Politècnica de Catalunya (UPC) - BarcelonaTech.Peer ReviewedPostprint (published version

    OpenMP on Networks of Workstations

    Get PDF
    We describe an implementation of a sizable subset of OpenMP on networks of workstations (NOWs). By extending the availability of OpenMP to NOWs, we overcome one of its primary drawbacks compared to MPI, namely lack of portability to environments other than hardware shared memory machines. In order to support OpenMP execution on NOWs, our compiler targets a software distributed shared memory system (DSM) which provides multi-threaded execution and memory consistency. This paper presents two contributions. First, we identify two aspects of the current OpenMP standard that make an implementation on NOWs hard, and suggest simple modifications to the standard that remedy the situation. These problems reflect differences in memory architecture between software and hardware shared memory and the high cost of synchronization on NOWs. Second, we present performance results of a prototype implementation of an OpenMP subset on a NOW, and compare them with hand-coded software DSM and MPI results for the same applications on the same platform. We use five applications (ASCI Sweep3d, NAS 3D- FFT, SPLASH-2 Water, QSORT, and TSP) exhibiting various styles of parallelization, including pipelined execution, data parallelism, coarse-grained parallelism, and task queues. The measurements show little difference between OpenMP and hand-coded software DSM, but both are still lagging behind MPI. Further work will concentrate on compiler optimization to reduce these differences

    Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory

    Get PDF
    Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.acceptedVersio

    Parallel Computers and Complex Systems

    Get PDF
    We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines
    • …
    corecore