Search CORE

65,794 research outputs found

Contract-Based General-Purpose GPU Programming

Author: Kolesnichenko Alexey
Meyer Bertrand
Nanz Sebastian
Poskitt Christopher M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2014
Field of study

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper suggests a programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language. The solution is integrated with the design-by-contract approach, which increases confidence in functional program correctness by embedding executable program specifications into the program text. We show that our library leads to modular and maintainable code that is accessible to GPGPU non-experts, while providing performance that is comparable with hand-written CUDA code. Furthermore, runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

Crossref

Institutional Knowledge at Singapore Management University

Structure-Aware Dynamic Scheduler for Parallel Machine Learning

Author: Gibson Garth A.
Ho Qirong
Kim Jin Kyu
Lee Seunghak
Xing Eric P.
Publication venue
Publication date: 30/12/2013
Field of study

Training large machine learning (ML) models with many variables or parameters can take a long time if one employs sequential procedures even with stochastic updates. A natural solution is to turn to distributed computing on a cluster; however, naive, unstructured parallelization of ML algorithms does not usually lead to a proportional speedup and can even result in divergence, because dependencies between model elements can attenuate the computational gains from parallelization and compromise correctness of inference. Recent efforts toward this issue have benefited from exploiting the static, a priori block structures residing in ML algorithms. In this paper, we take this path further by exploring the dynamic block structures and workloads therein present during ML program execution, which offers new opportunities for improving convergence, correctness, and load balancing in distributed ML. We propose and showcase a general-purpose scheduler, STRADS, for coordinating distributed updates in ML algorithms, which harnesses the aforementioned opportunities in a systematic way. We provide theoretical guarantees for our scheduler, and demonstrate its efficacy versus static block structures on Lasso and Matrix Factorization

arXiv.org e-Print Archive

CiteSeerX

Methods to Model-Check Parallel Systems Software

Author: Lusk Ewing
Matlin Olga Shumsky
McCune William
Publication venue
Publication date: 01/01/2003
Field of study

We report on an effort to develop methodologies for formal verification of parts of the Multi-Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of communicating processes. While the individual components of the collection execute simple algorithms, their interaction leads to unexpected errors that are difficult to uncover by conventional means. Two verification approaches are discussed here: the standard model checking approach using the software model checker SPIN and the nonstandard use of a general-purpose first-order resolution-style theorem prover OTTER to conduct the traditional state space exploration. We compare modeling methodology and analyze performance and scalability of the two methods with respect to verification of MPD.Comment: 12 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

UNT Digital Library

Group Communication Patterns for High Performance Computing in Scala

Author: Hargreaves Felix P.
Merkle Daniel
Schneider-Kamp Peter
Publication venue
Publication date: 01/01/2014
Field of study

We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000

arXiv.org e-Print Archive

CiteSeerX

Crossref

Optimistic Concurrency Control for Distributed Unsupervised Learning

Author: Broderick Tamara
Gonzalez Joseph E.
Jegelka Stefanie
Jordan Michael I.
Pan Xinghao
Publication venue
Publication date: 01/01/2013
Field of study

Research on distributed machine learning algorithms has focused primarily on one of two extremes - algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this "optimistic concurrency control" paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

A Purely Functional Computer Algebra System Embedded in Haskell

Author: G-M Greuel
H Kredel
J-C Faugére
O Lobachev
R Jolly
T Coquand
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/09/2018
Field of study

We demonstrate how methods in Functional Programming can be used to implement a computer algebra system. As a proof-of-concept, we present the computational-algebra package. It is a computer algebra system implemented as an embedded domain-specific language in Haskell, a purely functional programming language. Utilising methods in functional programming and prominent features of Haskell, this library achieves safety, composability, and correctness at the same time. To demonstrate the advantages of our approach, we have implemented advanced Gr\"{o}bner basis algorithms, such as Faug\`{e}re's

F_4

and

F_5

, in a composable way.Comment: 16 pages, Accepted to CASC 201

arXiv.org e-Print Archive

Crossref