Search CORE

3,971 research outputs found

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Tupleware: Redefining Modern Analytics

Author: Cetintemel Ugur
Crotty Andrew
Dursun Kayhan
Galakatos Alex
Kraska Tim
Zdonik Stan
Publication venue
Publication date: 30/07/2014
Field of study

There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

arXiv.org e-Print Archive

CiteSeerX

Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs

Author: Nabi Syed Waqar
Urlea Cristian
Vanderbauwhede Wim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/04/2018
Field of study

In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain

Crossref

Enlighten

Recommended from our members

MacShim: Compiling MATLAB to a Scheduling-Independent Concurrent Language

Author: Edwards Stephen A.
Ohan Oda
Subramaniam Neesha
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

Nondeterminism is a central challenge in most concurrent models of computation. That programmers must worry about races and other timing-dependent behavior is a key reason that parallel programming has not been widely adopted. The SHIM concurrent language, intended for hardware/software codesign applications, avoids this problem by providing deterministic (race-free) concurrency, but does not support automatic parallelization of sequential algorithms. In this paper, we present a compiler able to parallelize a simple MATLAB-like language into concurrent SHIM processes. From a user-provided partitioning of arrays to processes, our compiler divides the program into coarse-grained processes and schedules and synthesizes inter-process communication. We demonstrate the effectiveness of our approach on some image-processing algorithms

Columbia University Academic Commons

The TASTE Toolset: turning human designed heterogeneous systems into computer built homogeneous software.

Author: Conquet Eric
Dissaux Pierre
Hugues Jérôme
Perrotin Maxime
Tsiodras Thanassis
Publication venue
Publication date: 01/05/2010
Field of study

The TASTE tool-set results from spin-off studies of the ASSERT project, which started in 2004 with the objective to propose innovative and pragmatic solutions to develop real-time software. One of the primary targets was satellite flight software, but it appeared quickly that their characteristics were shared among various embedded systems. The solutions that we developed now comprise a process and several tools ; the development process is based on the idea that real-time, embedded systems are heterogeneous by nature and that a unique UML-like language was not helping neither their construction, nor their validation. Rather than inventing yet another "ultimate" language, TASTE makes the link between existing and mature technologies such as Simulink, SDL, ASN.1, C, Ada, and generates complete, homogeneous software-based systems that one can straightforwardly download and execute on a physical target. Our current prototype is moving toward a marketed product, and sequel studies are already in place to support, among others, FPGA systems

Open Archive Toulouse Archive Ouverte