84 research outputs found
Evaluating the performance of software distributed shared memory as a target for parallelizing compilers
In this paper we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, hand-coded software DSM and hand-coded message passing. For this comparison, we use six applications: four that are regular and two that are irregular: Our results are gathered on an 8-node IBM SP/2 using the TreadMarks software DSM system. We use the APR shared-memory (SPF) compiler to generate the shared memory-programs and the APR XHPF compiler to generate message passing programs. The hand-coded message passing programs run with the IBM PVMe optimized message passing library. On the regular programs, both the compiler-generated and the hand-coded message passing outperform the SPF/TreadMarks combination: the compiler-generated message passing by 5.5% to 40%, and the hand-coded message passing by 7.5% to 49%. On the irregular programs, the SPF/TreadMarks combination outperforms the compiler-generated message passing by 38% and 89%, and only slightly underperforms the hand-coded message passing, differing by 4.4% and 16%. We also identify the factors that account for the performance differences, estimate their relative importance, and describe methods to improve the performanc
Tareador: a tool to unveil parallelization strategies at undergraduate level
This paper presents a methodology and framework designed to assist students in the process of finding appropriate task decomposition strategies for their sequential program, as well as identifying bottlenecks in the later execution of the parallel program. One of the main components of this framework is Tareador, which provides a simple API to specify potential task decomposition strategies for a sequential program. Once
the student proposes how to break the sequential code into
tasks, Tareador 1) provides information about the dependences
between tasks that should be honored when implementing that
task decomposition using a parallel programming model; and 2)
estimates the potential parallelism that could be achieved in an
ideal parallel architecture with infinite processors; and 3) sim-
ulates the parallel execution on an ideal architecture estimating
the potential speed–up that could be achieved on a number of
processors. The pedagogical style of the methodology is currently
applied to teach parallelism in a third-year compulsory subject in
the Bachelor Degree in Informatics Engineering at the Barcelona
School of Informatics of the Universitat Politècnica de Catalunya
(UPC) - BarcelonaTech.Peer ReviewedPostprint (published version
OpenMP on Networks of Workstations
We describe an implementation of a sizable subset of OpenMP on networks of workstations (NOWs). By extending the availability of OpenMP to NOWs, we overcome one of its primary drawbacks compared to MPI, namely lack of portability to environments other than hardware shared memory machines. In order to support OpenMP execution on NOWs, our compiler targets a software distributed shared memory system (DSM) which provides multi-threaded execution and memory consistency. This paper presents two contributions. First, we identify two aspects of the current OpenMP standard that make an implementation on NOWs hard, and suggest simple modifications to the standard that remedy the situation. These problems reflect differences in memory architecture between software and hardware shared memory and the high cost of synchronization on NOWs. Second, we present performance results of a prototype implementation of an OpenMP subset on a NOW, and compare them with hand-coded software DSM and MPI results for the same applications on the same platform. We use five applications (ASCI Sweep3d, NAS 3D- FFT, SPLASH-2 Water, QSORT, and TSP) exhibiting various styles of parallelization, including pipelined execution, data parallelism, coarse-grained parallelism, and task queues. The measurements show little difference between OpenMP and hand-coded software DSM, but both are still lagging behind MPI. Further work will concentrate on compiler optimization to reduce these differences
Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory
Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.acceptedVersio
Parallel Computers and Complex Systems
We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines
- …