1,920 research outputs found
ParaFPGA : parallel computing with flexible hardware
ParaFPGA 2009 is a Mini-Symposium on parallel computing with field programmable gate arrays (FPGAs), held in conjunction with the ParCo conference on parallel computing. FPGAs allow to map an algorithm directly onto the hardware, optimize the architecture for parallel execution, and dynamically reconfigure the system in between different phases of the computation. Compared to e.g. Cell processors, GPGPU's (general-purpose GPU's) and other high-performance devices, FPGAs are considered as flexible hardware in the sense that the building blocks of one or more single or multiple FPGAs can be interconnected freely to build a highly parallel system. In this Mini-Symposium the following topics are addressed: clustering FPGAs, evolvable hardware using FPGAs and fast dynamic reconfiguration
Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors
Sparse matrix-vector multiplication (SpMV) is a central building block for
scientific software and graph applications. Recently, heterogeneous processors
composed of different types of cores attracted much attention because of their
flexible core configuration and high energy efficiency. In this paper, we
propose a compressed sparse row (CSR) format based SpMV algorithm utilizing
both types of cores in a CPU-GPU heterogeneous processor. We first
speculatively execute segmented sum operations on the GPU part of a
heterogeneous processor and generate a possibly incorrect results. Then the CPU
part of the same chip is triggered to re-arrange the predicted partial sums for
a correct resulting vector. On three heterogeneous processors from Intel, AMD
and nVidia, using 20 sparse matrices as a benchmark suite, the experimental
results show that our method obtains significant performance improvement over
the best existing CSR-based SpMV algorithms. The source code of this work is
downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO
Using eSkel to Implement the Multiple Baseline Stereo Application
We give an overview of the Edinburgh Skeleton Library eSkel, a structured parallel programming library which offers a range of skeletal parallel programming constructs to the C/MPI programmer. Then we illustrate the efficacy of such a high level approach through an application of multiple baseline stereo. We describe the application and show different ways to introduce parallelism using algorithmic skeletons. Some performance results will be reported
Cactus: Issues for Sustainable Simulation Software
The Cactus Framework is an open-source, modular, portable programming
environment for the collaborative development and deployment of scientific
applications using high-performance computing. Its roots reach back to 1996 at
the National Center for Supercomputer Applications and the Albert Einstein
Institute in Germany, where its development jumpstarted. Since then, the Cactus
framework has witnessed major changes in hardware infrastructure as well as its
own community. This paper describes its endurance through these past changes
and, drawing upon lessons from its past, also discusses futureComment: submitted to the Workshop on Sustainable Software for Science:
Practice and Experiences 201
Parallel computing 2011, ParCo 2011: book of abstracts
This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu
An Efficient Thread Mapping Strategy for Multiprogramming on Manycore Processors
The emergence of multicore and manycore processors is set to change the
parallel computing world. Applications are shifting towards increased
parallelism in order to utilise these architectures efficiently. This leads to
a situation where every application creates its desirable number of threads,
based on its parallel nature and the system resources allowance. Task
scheduling in such a multithreaded multiprogramming environment is a
significant challenge. In task scheduling, not only the order of the execution,
but also the mapping of threads to the execution resources is of a great
importance. In this paper we state and discuss some fundamental rules based on
results obtained from selected applications of the BOTS benchmarks on the
64-core TILEPro64 processor. We demonstrate how previously efficient mapping
policies such as those of the SMP Linux scheduler become inefficient when the
number of threads and cores grows. We propose a novel, low-overhead technique,
a heuristic based on the amount of time spent by each CPU doing some useful
work, to fairly distribute the workloads amongst the cores in a
multiprogramming environment. Our novel approach could be implemented as a
pragma similar to those in the new task-based OpenMP versions, or can be
incorporated as a distributed thread mapping mechanism in future manycore
programming frameworks. We show that our thread mapping scheme can outperform
the native GNU/Linux thread scheduler in both single-programming and
multiprogramming environments.Comment: ParCo Conference, Munich, Germany, 201
A Tool for Programming Embarrassingly Task Parallel Applications on CoW and NoW
Embarrassingly parallel problems can be split in parts that are characterized
by a really low (or sometime absent) exchange of information during their
computation in parallel. As a consequence they can be effectively computed in
parallel exploiting commodity hardware, hence without particularly
sophisticated interconnection networks. Basically, this means Clusters,
Networks of Workstations and Desktops as well as Computational Clouds. Despite
the simplicity of this computational model, it can be exploited to compute a
quite large range of problems. This paper describes JJPF, a tool for developing
task parallel applications based on Java and Jini that showed to be an
effective and efficient solution in environment like Clusters and Networks of
Workstations and Desktops.Comment: 7 page
- …