8 research outputs found

    A Combined MPI-CUDA Parallel Solution of Linear and Nonlinear Poisson-Boltzmann Equation

    Get PDF

    Lessons learned in a decade of research software engineering gpu applications

    Get PDF
    After years of using Graphics Processing Units (GPUs) to accelerate scientific applications in fields as varied as tomography, computer vision, climate modeling, digital forensics, geospatial databases, particle physics, radio astronomy, and localization microscopy, we noticed a number of technical, socio-technical, and non-technical challenges that Research Software Engineers (RSEs) may run into. While some of these challenges, such as managing different programming languages within a project, or having to deal with different memory spaces, are common to all software projects involving GPUs, others are more typical of scientific software projects. Among these challenges we include changing resolutions or scales, maintaining an application over time and making it sustainable, and evaluating both the obtained results and the achieved performance

    Towards User Transparent Parallel Multimedia Computing on GPU-clusters

    No full text
    Abstract. The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of MMCA problems, the use of High Performance Computing (HPC) techniques is essential. As most MMCA researchers are not HPC experts, there is an urgent need for ’familiar ’ programming models and tools that are both easy to use and efficient. Today, several user transparent library-based parallelization tools exist that aim to satisfy both these requirements. In general, such tools focus on data parallel execution on traditional compute clusters. As of yet, none of these tools also incorporate the use of many-core processors (e.g. GPUs), however. While traditional clusters are now being transformed into GPU-clusters, programming complexity vastly increases — and the need for easy and efficient programming models is as urgent as ever. This paper presents our first steps in the direction of obtaining a user transparent programming model for data parallel and hierarchical multimedia computing on GPU-clusters. The model is obtained by extending an existing user transparent parallel programming system (applicable to traditional compute clusters) with a set of CUDA compute kernels. We show our model to be capable of obtaining orders-of-magnitude speed improvements, without requiring any additional effort from the application programmer.

    User Transparent Data and Task Parallel Multimedia Computing with Pyxis-DT

    No full text
    The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. However, as most MMCA researchers are not also HPC experts, in the field there is a demand~for~programming models and tools that are both efficient and easy~to~use. Today several user transparent library-based parallelization tools exist that aim to satisfy both these requirements. Such tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. However, for certain MMCA applications a data parallel approach induces intensive communication, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches. This paper presents Pyxis-DT: a user transparent parallel programming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Each of these building block operations can be executed in data parallel fashion. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation

    User transparent data and task parallel multimedia computing with Pyxis-DT

    No full text
    The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. As most MMCA researchers are not also HPC experts, however, there is a demand for programming models and tools that are both efficient and easy to use. Existing user transparent parallelization tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. For certain MMCA applications a data parallel approach induces intensive communication, however, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches. We present Pyxis-DT, a user transparent parallel programming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation. Extensions for GPU clusters are also presented. © 2013 Elsevier B.V. All rights reserved

    A Comparison of Distributed Data Parallel Multimedia Computing over Conventional and Optical Wide-Area Networks

    No full text
    The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data streams and archives. As individual compute clusters can not satisfy the increasing computational demands of emerging MMCA problems, distributed supercomputing on collections of compute clusters is rapidly becoming indispensable. A well-known manner of obtaining speedups in MMCA is to apply data parallel approaches, in which commonly used data structures (e.g. video frames) are being scattered among the available compute nodes. Such approaches work well for individual compute clusters, but - due to the inherently large wide-area communication overheads - these are generally not applied in distributed cluster systems. Given the increasing availability of low-latency, high-bandwidth optical wide-area networks, however, wide-area data parallel execution may now become a feasible acceleration approach. This paper discusses the wide-area data parallel execution of a realistic MMCA problem. It presents experimental results obtained on real distributed systems, and provides a feasibility analysis of the applied parallelization approach
    corecore