Search CORE

18,148 research outputs found

SkelCL - A Portable Skeleton Library for High-Level GPU Programming

Author: Gorlatch Sergei
Kegel Philipp
Steuwer Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

While CUDA and OpenCL made general-purpose programming for Graphics Processing Units (GPU) popular, using these programming approaches remains complex and error-prone because they lack high-level abstractions. The especially challenging systems with multiple GPU are not addressed at all by these low-level programming models. We propose SkelCL – a library providing so-called algorithmic skeletons that capture recurring patterns of parallel computation and communication, together with an abstract vector data type and constructs for specifying data distribution. We demonstrate that SkelCL greatly simplifies programming GPU systems. We report the competitive performance results of SkelCL using both a simple Mandelbrot set computation and an industrial-strength medical imaging application. Because the library is implemented using OpenCL, it is portable across GPU hardware of different vendors

Crossref

Enlighten

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe

SOLUTIONS FOR OPTIMIZING THE DATA PARALLEL PREFIX SUM ALGORITHM USING THE COMPUTE UNIFIED DEVICE ARCHITECTURE

Author: Alexandru Pîrjan
Dana-Mihaela Petroşanu
Ion Lungu
Publication venue
Publication date
Field of study

In this paper, we analyze solutions for optimizing the data parallel prefix sum function using the Compute Unified Device Architecture (CUDA) that provides a viable solution for accelerating a broad class of applications. The parallel prefix sum function is an essential building block for many data mining algorithms, and therefore its optimization facilitates the whole data mining process. Finally, we benchmark and evaluate the performance of the optimized parallel prefix sum building block in CUDA.CUDA, threads, GPGPU, parallel prefix sum, parallel processing, task synchronization, warp

Research Papers in Economics

Algorithmic patterns for $\mathcal{H}$ -matrices on many-core processors

Author: Zaspel Peter
Publication venue
Publication date: 01/01/2017
Field of study

In this work, we consider the reformulation of hierarchical (

\mathcal{H}

) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs).

\mathcal{H}

matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of

\mathcal{H}

matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing

\mathcal{H}

matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full

\mathcal{H}

matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source

\mathcal{H}

matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard

\mathcal{H}

matrix library, highlighting profound speedups of our many-core parallel approach

arXiv.org e-Print Archive

edoc

A Neural Algorithm of Artistic Style

Author: Bethge Matthias
Ecker Alexander S.
Gatys Leon A.
Publication venue
Publication date: 02/09/2015
Field of study

In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery

arXiv.org e-Print Archive

MPG.PuRe

Complexity plots

Author: Chen M.
Duffy B.R.
Thiyagalingam J.
Trefethen A.
Walton S.
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we present a novel visualization technique for assisting in observation and analysis of algorithmic\ud complexity. In comparison with conventional line graphs, this new technique is not sensitive to the units of\ud measurement, allowing multivariate data series of different physical qualities (e.g., time, space and energy) to be juxtaposed together conveniently and consistently. It supports multivariate visualization as well as uncertainty visualization. It enables users to focus on algorithm categorization by complexity classes, while reducing visual impact caused by constants and algorithmic components that are insignificant to complexity analysis. It provides an effective means for observing the algorithmic complexity of programs with a mixture of algorithms and blackbox software through visualization. Through two case studies, we demonstrate the effectiveness of complexity plots in complexity analysis in research, education and application

Crossref

Oxford University Research Archive