Search CORE

143 research outputs found

OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

Author: A Klöckner
D Charousset
G Agha
G Agha
J Nickolls
JD Owens
K Wu
L Dagum
S Srinivasan
S Wienke
T Desell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

arXiv.org e-Print Archive

Crossref

REPOSIT HAW Hamburg

GPU-Based Data Processing for 2-D Microwave Imaging on MAST

Author: BITTNER R.
CASTRO R.
DAVIS W. M.
EIDIETIS N. W.
FREETHY S. J.
FREETHY S. J.
GARELLI N.
HUANG B. K.
LUJAN P.
MONTEIRO E.
NAVARRO C. A.
NAYLOR G. A.
NICKOLLS J.
OWENS J. D.
PELL O.
SALMON N. A.
SHEVCHENKO V. F.
SHEVCHENKO V. F.
THOMAS D. A.
THOUTI K.
URBAN J.
VAN CITTERT P. H.
VERMIJ E.
WYNTERS E.
XU C.
YANG L.
YUE X.
ZERNIKE F.
Publication venue: 'American Nuclear Society'
Publication date: 07/04/2016
Field of study

The Synthetic Aperture Microwave Imaging (SAMI) diagnostic is a Mega Amp Spherical Tokamak (MAST) diagnostic based at Culham Centre for Fusion Energy. The acceleration of the SAMI diagnostic data-processing code by a graphics processing unit is presented, demonstrating acceleration of up to 60 times compared to the original IDL (Interactive Data Language) data-processing code. SAMI will now be capable of intershot processing allowing pseudo-real-time control so that adjustments and optimizations can be made between shots. Additionally, for the first time the analysis of many shots will be possible

Durham Research Online

Crossref

White Rose Research Online

Antihyperalgesia by α2-GABAA Receptors Occurs Via a Genuine Spinal Action and Does Not Involve Supraspinal Sites

Author: A Di Lio
Alessandra Di Lio
C Vidal
CS Sang
Dietmar Benke
E Persohn
FM Rivas
G Munro
G Munro
GJ Bennett
Gonzalo E Yévenes
H Möhler
HA Wieland
Hanns Ulrich Zeilhofer
HU Zeilhofer
HU Zeilhofer
HV Morris
J Andre
J Knabl
J Knabl
J Paul
J Schouenborg
JA Benson
JA Coull
JA Harris
James M Cook
Jean-Marc Fritschy
JM Fritschy
Jolly Paul
JR Atack
K Hösl
K Löw
L Jasmin
Louis Scheurer
MA Tatsuo
NR Mirza
P Scott-Stevens
R Melzack
R Witschi
R Witschi
RJ Harvey
RM McKernan
Robert Witschi
RW Olsen
S Ahmadi
S Nickolls
S Reichl
TJ Luger
U Rudolph
U Rudolph
Uwe Rudolph
W Wisden
William T Ralvenius
Y Carrasquillo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2013
Field of study

Drugs that enhance GABAergic inhibition alleviate inflammatory and neuropathic pain after spinal application. This antihyperalgesia occurs mainly through GABAA receptors (GABAARs) containing α2 subunits (α2-GABAARs). Previous work indicates that potentiation of these receptors in the spinal cord evokes profound antihyperalgesia also after systemic administration, but possible synergistic or antagonistic actions of supraspinal α2-GABAARs on spinal antihyperalgesia have not yet been addressed. Here we generated two lines of GABAAR-mutated mice, which either lack α2-GABAARs specifically from the spinal cord, or, which express only benzodiazepine-insensitive α2-GABAARs at this site. We analyzed the consequences of these mutations for antihyperalgesia evoked by systemic treatment with the novel non-sedative benzodiazepine site agonist HZ166 in neuropathic and inflammatory pain. Wild-type mice and both types of mutated mice had similar baseline nociceptive sensitivities and developed similar hyperalgesia. However, antihyperalgesia by systemic HZ166 was reduced in both mutated mouse lines by about 60% and was virtually indistinguishable from that of global point-mutated mice, in which all α2-GABAARs were benzodiazepine insensitive. The major (α2-dependent) component of GABAAR-mediated antihyperalgesia was therefore exclusively of spinal origin, whereas supraspinal α2-GABAARs had neither synergistic nor antagonistic effects on antihyperalgesia. Our results thus indicate that drugs that specifically target α2-GABAARs exert their antihyperalgesic effect through enhanced spinal nociceptive control. Such drugs may therefore be well-suited for the systemic treatment of different chronic pain conditions

Repository for Publications and Research Data

Crossref

Harvard University - DASH

PubMed Central

Optimistic Parallelism on GPUs

Author: E Ayguadé
J Nickolls
JE Stone
L Dagum
S Liu
S Wienke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2015
Field of study

Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregulari-ties that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, com-putation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data paral-lelism, the latter three phases represent overhead costs of using specu-lation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our program-ming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.

CiteSeerX

Crossref

MemShield: GPU-assisted software memory encryption

Author: A Würstlein
J Bauer
J Lin
J Nickolls
JA Halderman
M Henson
M Huber
M Zhang
P Papadopoulos
R Stoyanov
S Dey
S Maitra
S Vömel
S Vömel
Y Chen
Z Wang
Publication venue
Publication date: 20/04/2020
Field of study

Cryptographic algorithm implementations are vulnerable to Cold Boot attacks, which consist in exploiting the persistence of RAM cells across reboots or power down cycles to read the memory contents and recover precious sensitive data. The principal defensive weapon against Cold Boot attacks is memory encryption. In this work we propose MemShield, a memory encryption framework for user space applications that exploits a GPU to safely store the master key and perform the encryption/decryption operations. We developed a prototype that is completely transparent to existing applications and does not require changes to the OS kernel. We discuss the design, the related works, the implementation, the security analysis, and the performances of MemShield.Comment: 14 pages, 2 figures. In proceedings of the 18th International Conference on Applied Cryptography and Network Security, ACNS 2020, October 19-22 2020, Rome, Ital

arXiv.org e-Print Archive

Crossref

ART

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Author: A Hart
D Dutykh
G Ruetsch
I Karlin
IZ Reguly
J Gong
J Nickolls
JE Stone
M Martineau
M Norman
MB Giles
S Wienke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang's CUDA compiler frequently outperform NVIDIA's nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Repository of the Academy's Library

DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

Author: B Langmead
B Schmidt
Bertil Schmidt
BH Bloom
Douglas L Maskell
DR Zerbino
E Lindholm
EW Myers
H Shi
H Shi
J Butler
J Nickolls
J Schröder
JC Dohm
JT Simpson
L Fan
L Salmela
MJ Chaisson
P Havlak
PA Pevzner
R Li
RL Warren
S Batzoglou
WR Jeck
X Huang
Y Liu
Y Liu
Y Liu
Y Liu
Yongchao Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the <it>de novo </it>assembly in terms of assembly quality and scalability for large-scale short read datasets. Results We present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs) using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve <it>de novo </it>assembly quality for <it>de</it>-<it>Bruijn</it>-graph-based assemblers. Conclusions DecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Overcoming the limitations of conventional vector processors

Author: Butts J.
Christos Kozyrakis
David Patterson
Espasa R.
Espasa R.
Espasa R.
Farkas K.
Khailany B.
Kim H.
Kozyrakis C.
Nickolls J.
Rixner S.
Tahakara H.
Tsui E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Accelerated large-scale multiple sequence alignment

Author: A Szalkowski
A Wilm
A Wirawan
AV Bhatt
C Grasso
C Notredame
D Mikhailov
DF Feng
E Eskin
G Tan
GM Amdahl
H Carroll
H Vandierendonck
I Letunic
J Cheetham
J Ebedes
J Nickolls
JD Thompson
JD Thompson
JD Thompson
K Katoh
KB Li
M Farrar
M Feldman
M Friedman
OpenMP
Quinn O Snell
RC Edgar
S Lloyd
S Washietl
Scott Lloyd
SR Eddy
T Lassmann
T Oliver
T Ramdas
T Wang
X Deng
X Lin
Y Li
Y Liu
Y Liu
Publication venue: BioMed Central
Publication date: 01/12/2011
Field of study

Abstract Background Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware. Results We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor. Conclusions Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from <url>http://dna.cs.byu.edu/msa/</url>.</p

Crossref

Directory of Open Access Journals

PubMed Central

Vibration-induced extra torque during electrically-evoked contractions of the human calf muscles

Abstract Background High-frequency trains of electrical stimulation applied over the lower limb muscles can generate forces higher than would be expected from a peripheral mechanism (i.e. by direct activation of motor axons). This phenomenon is presumably originated within the central nervous system by synaptic input from Ia afferents to motoneurons and is consistent with the development of plateau potentials. The first objective of this work was to investigate if vibration (sinusoidal or random) applied to the Achilles tendon is also able to generate large magnitude extra torques in the triceps surae muscle group. The second objective was to verify if the extra torques that were found were accompanied by increases in motoneuron excitability. Methods Subjects (n = 6) were seated on a chair and the right foot was strapped to a pedal attached to a torque meter. The isometric ankle torque was measured in response to different patterns of coupled electrical (20-Hz, rectangular 1-ms pulses) and mechanical stimuli (either 100-Hz sinusoid or gaussian white noise) applied to the triceps surae muscle group. In an additional investigation, Mmax and F-waves were elicited at different times before or after the vibratory stimulation. Results The vibratory bursts could generate substantial self-sustained extra torques, either with or without the background 20-Hz electrical stimulation applied simultaneously with the vibration. The extra torque generation was accompanied by increased motoneuron excitability, since an increase in the peak-to-peak amplitude of soleus F waves was observed. The delivery of electrical stimulation following the vibration was essential to keep the maintained extra torques and increased F-waves. Conclusions These results show that vibratory stimuli applied with a background electrical stimulation generate considerable force levels (up to about 50% MVC) due to the spinal recruitment of motoneurons. The association of vibration and electrical stimulation could be beneficial for many therapeutic interventions and vibration-based exercise programs. The command for the vibration-induced extra torques presumably activates spinal motoneurons following the size principle, which is a desirable feature for stimulation paradigms.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório da Produção USP (Univ. de São Paulo)