1,920 research outputs found

    ParaFPGA : parallel computing with flexible hardware

    Get PDF
    ParaFPGA 2009 is a Mini-Symposium on parallel computing with field programmable gate arrays (FPGAs), held in conjunction with the ParCo conference on parallel computing. FPGAs allow to map an algorithm directly onto the hardware, optimize the architecture for parallel execution, and dynamically reconfigure the system in between different phases of the computation. Compared to e.g. Cell processors, GPGPU's (general-purpose GPU's) and other high-performance devices, FPGAs are considered as flexible hardware in the sense that the building blocks of one or more single or multiple FPGAs can be interconnected freely to build a highly parallel system. In this Mini-Symposium the following topics are addressed: clustering FPGAs, evolvable hardware using FPGAs and fast dynamic reconfiguration

    Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

    Using eSkel to Implement the Multiple Baseline Stereo Application

    Get PDF
    We give an overview of the Edinburgh Skeleton Library eSkel, a structured parallel programming library which offers a range of skeletal parallel programming constructs to the C/MPI programmer. Then we illustrate the efficacy of such a high level approach through an application of multiple baseline stereo. We describe the application and show different ways to introduce parallelism using algorithmic skeletons. Some performance results will be reported

    Cactus: Issues for Sustainable Simulation Software

    Full text link
    The Cactus Framework is an open-source, modular, portable programming environment for the collaborative development and deployment of scientific applications using high-performance computing. Its roots reach back to 1996 at the National Center for Supercomputer Applications and the Albert Einstein Institute in Germany, where its development jumpstarted. Since then, the Cactus framework has witnessed major changes in hardware infrastructure as well as its own community. This paper describes its endurance through these past changes and, drawing upon lessons from its past, also discusses futureComment: submitted to the Workshop on Sustainable Software for Science: Practice and Experiences 201

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    An Efficient Thread Mapping Strategy for Multiprogramming on Manycore Processors

    Full text link
    The emergence of multicore and manycore processors is set to change the parallel computing world. Applications are shifting towards increased parallelism in order to utilise these architectures efficiently. This leads to a situation where every application creates its desirable number of threads, based on its parallel nature and the system resources allowance. Task scheduling in such a multithreaded multiprogramming environment is a significant challenge. In task scheduling, not only the order of the execution, but also the mapping of threads to the execution resources is of a great importance. In this paper we state and discuss some fundamental rules based on results obtained from selected applications of the BOTS benchmarks on the 64-core TILEPro64 processor. We demonstrate how previously efficient mapping policies such as those of the SMP Linux scheduler become inefficient when the number of threads and cores grows. We propose a novel, low-overhead technique, a heuristic based on the amount of time spent by each CPU doing some useful work, to fairly distribute the workloads amongst the cores in a multiprogramming environment. Our novel approach could be implemented as a pragma similar to those in the new task-based OpenMP versions, or can be incorporated as a distributed thread mapping mechanism in future manycore programming frameworks. We show that our thread mapping scheme can outperform the native GNU/Linux thread scheduler in both single-programming and multiprogramming environments.Comment: ParCo Conference, Munich, Germany, 201

    A Tool for Programming Embarrassingly Task Parallel Applications on CoW and NoW

    Full text link
    Embarrassingly parallel problems can be split in parts that are characterized by a really low (or sometime absent) exchange of information during their computation in parallel. As a consequence they can be effectively computed in parallel exploiting commodity hardware, hence without particularly sophisticated interconnection networks. Basically, this means Clusters, Networks of Workstations and Desktops as well as Computational Clouds. Despite the simplicity of this computational model, it can be exploited to compute a quite large range of problems. This paper describes JJPF, a tool for developing task parallel applications based on Java and Jini that showed to be an effective and efficient solution in environment like Clusters and Networks of Workstations and Desktops.Comment: 7 page
    • …
    corecore