Search CORE

19 research outputs found

Accelerated Event-by-Event Neutrino Oscillation Reweighting with Matter Effects on a GPU

Author: A C Kaboth
D Payne
Khronos OpenCL Working Group
N. Whitehead
NVIDIA Corporation
OpenMP Architecture Review Board
P. Pomorski
R G Calland
R. Wendell
Publication venue: 'IOP Publishing'
Publication date: 29/11/2013
Field of study

Oscillation probability calculations are becoming increasingly CPU intensive in modern neutrino oscillation analyses. The independency of reweighting individual events in a Monte Carlo sample lends itself to parallel implementation on a Graphics Processing Unit. The library "Prob3++" was ported to the GPU using the CUDA C API, allowing for large scale parallelized calculations of neutrino oscillation probabilities through matter of constant density, decreasing the execution time by a factor of 75, when compared to performance on a single CPU.Comment: Final Update: Post submission update Updated version: quantified the difference in event rates for binned and event-by-event reweighting with a typical binning scheme. Improved formatting of reference

arXiv.org e-Print Archive

Crossref

Royal Holloway - Pure

Distributed Block Coordinate Descent for Minimizing Partially Separable Functions

Author: A. Saha
B.K. Natarajan
C. Scherrer
D. Ge
D.D. Lewis
D.P. Bertsekas
D.P. Bertsekas
E.Y. Chang
F. Niu
N.K. Alham
O. Fercoq
OpenMP Architecture Review Board
P. Richtárik
P. Tseng
P. Tseng
P. Tseng
S. Shalev-Shwartz
S. Shalev-Shwartz
Y. Nesterov
Publication venue
Publication date: 02/06/2014
Field of study

In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that the degree of separability directly influences the complexity. This extends the results in [Richtarik, Takac: Parallel coordinate descent methods for big data optimization] to a distributed environment. We first show that partially block separable functions admit an expected separable overapproximation (ESO) with respect to a distributed sampling, compute the ESO parameters, and then specialize complexity results from recent literature that hold under the generic ESO assumption. We describe several approaches to distribution and synchronization of the computation across a cluster of multi-core computers and provide promising computational results.Comment: in Recent Developments in Numerical Analysis and Optimization, 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Fast event-based epidemiological simulations on national scales

Author: Anonymous
Anonymous
Drawert B
Fujimoto RM
Fujimoto RM
OpenMP Architecture Review Board
Pavol Bauer
Stefan Engblom
Stefan Widgren
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Workload decomposition strategies for hierarchical distributed-shared memory parallel systems and their implementation with integration of high-level parallel languages

Author: Akarsu
Birdsall
Briguglio
Chen
Decyk
Di Martino
Di Martino
Ferraro
Fox
High Performance Fortran Forum
Labarta
Liewer
Norton
OpenMP Architecture Review Board
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

VPP Fortran and the design of HPF/JA extensions

Author: High Performance Fortran Forum
Iwashita
Iwashita
Japan Association for High Performance Fortran (JAHPF)
Koelbel
OpenMP Architecture Review Board
Saini
van Waveren
Yamanaka
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Dissecting sequential programs for parallelization-An approach based on computational units

Author: Ali Jannesari
Bailey
Bernstein
Bobbie
Burke
Darema
Diego R. Llanos
Felix Wolf
J. Daniel Garcia
Kennedy
Li
OpenMP Architecture Review Board
Reinders
Rohit Atre
Rul
Sarkar
Ul-Huda
Wilkinson
Zia Ul-Huda
Publication venue: 'Wiley'
Publication date: 29/06/2018
Field of study

When trying to parallelize a sequential program, programmers routinely struggle during the first step: finding out which code sections can be made to run in parallel. While identifying such code sections, most of the current parallelism discovery techniques focus on specific language constructs. In contrast, we propose to concentrate on the computations performed by a program. In our approach, a program is treated as a collection of computations communicating with one another using a number of variables. Each computation is represented as a computational unit (CU). A CU contains the inputs and outputs of a computation, and the three phases of a computation are read, compute, and write. Based on the notion of CU, which ensures that the read phase executes before the write phase, we present a unified framework to identify both loop parallelism and task parallelism in sequential programs. We conducted a range of experiments on 23 applications from four different benchmark suites. Our approach accurately identified the parallelization opportunities in benchmark applications based on comparison with their parallel versions. We have also parallelized the opportunities identified by our approach that were not implemented in the parallel versions of the benchmarks and reported the speedup

TUbiblio

Crossref