118 research outputs found

    OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

    Full text link
    The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

    Optimistic Parallelism on GPUs

    Full text link
    Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregulari-ties that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, com-putation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data paral-lelism, the latter three phases represent overhead costs of using specu-lation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our program-ming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.

    MemShield: GPU-assisted software memory encryption

    Full text link
    Cryptographic algorithm implementations are vulnerable to Cold Boot attacks, which consist in exploiting the persistence of RAM cells across reboots or power down cycles to read the memory contents and recover precious sensitive data. The principal defensive weapon against Cold Boot attacks is memory encryption. In this work we propose MemShield, a memory encryption framework for user space applications that exploits a GPU to safely store the master key and perform the encryption/decryption operations. We developed a prototype that is completely transparent to existing applications and does not require changes to the OS kernel. We discuss the design, the related works, the implementation, the security analysis, and the performances of MemShield.Comment: 14 pages, 2 figures. In proceedings of the 18th International Conference on Applied Cryptography and Network Security, ACNS 2020, October 19-22 2020, Rome, Ital

    Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

    Get PDF
    Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang's CUDA compiler frequently outperform NVIDIA's nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA

    GPU-Based Data Processing for 2-D Microwave Imaging on MAST

    Get PDF
    The Synthetic Aperture Microwave Imaging (SAMI) diagnostic is a Mega Amp Spherical Tokamak (MAST) diagnostic based at Culham Centre for Fusion Energy. The acceleration of the SAMI diagnostic data-processing code by a graphics processing unit is presented, demonstrating acceleration of up to 60 times compared to the original IDL (Interactive Data Language) data-processing code. SAMI will now be capable of intershot processing allowing pseudo-real-time control so that adjustments and optimizations can be made between shots. Additionally, for the first time the analysis of many shots will be possible

    DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the <it>de novo </it>assembly in terms of assembly quality and scalability for large-scale short read datasets.</p> <p>Results</p> <p>We present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs) using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve <it>de novo </it>assembly quality for <it>de</it>-<it>Bruijn</it>-graph-based assemblers.</p> <p>Conclusions</p> <p>DecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.</p

    Accelerated large-scale multiple sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware.</p> <p>Results</p> <p>We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor.</p> <p>Conclusions</p> <p>Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from <url>http://dna.cs.byu.edu/msa/</url>.</p

    Antihyperalgesia by α2-GABAA Receptors Occurs Via a Genuine Spinal Action and Does Not Involve Supraspinal Sites

    Get PDF
    Drugs that enhance GABAergic inhibition alleviate inflammatory and neuropathic pain after spinal application. This antihyperalgesia occurs mainly through GABAA receptors (GABAARs) containing α2 subunits (α2-GABAARs). Previous work indicates that potentiation of these receptors in the spinal cord evokes profound antihyperalgesia also after systemic administration, but possible synergistic or antagonistic actions of supraspinal α2-GABAARs on spinal antihyperalgesia have not yet been addressed. Here we generated two lines of GABAAR-mutated mice, which either lack α2-GABAARs specifically from the spinal cord, or, which express only benzodiazepine-insensitive α2-GABAARs at this site. We analyzed the consequences of these mutations for antihyperalgesia evoked by systemic treatment with the novel non-sedative benzodiazepine site agonist HZ166 in neuropathic and inflammatory pain. Wild-type mice and both types of mutated mice had similar baseline nociceptive sensitivities and developed similar hyperalgesia. However, antihyperalgesia by systemic HZ166 was reduced in both mutated mouse lines by about 60% and was virtually indistinguishable from that of global point-mutated mice, in which all α2-GABAARs were benzodiazepine insensitive. The major (α2-dependent) component of GABAAR-mediated antihyperalgesia was therefore exclusively of spinal origin, whereas supraspinal α2-GABAARs had neither synergistic nor antagonistic effects on antihyperalgesia. Our results thus indicate that drugs that specifically target α2-GABAARs exert their antihyperalgesic effect through enhanced spinal nociceptive control. Such drugs may therefore be well-suited for the systemic treatment of different chronic pain conditions

    Vibration-induced extra torque during electrically-evoked contractions of the human calf muscles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-frequency trains of electrical stimulation applied over the lower limb muscles can generate forces higher than would be expected from a peripheral mechanism (i.e. by direct activation of motor axons). This phenomenon is presumably originated within the central nervous system by synaptic input from Ia afferents to motoneurons and is consistent with the development of plateau potentials. The first objective of this work was to investigate if vibration (sinusoidal or random) applied to the Achilles tendon is also able to generate large magnitude extra torques in the triceps surae muscle group. The second objective was to verify if the extra torques that were found were accompanied by increases in motoneuron excitability.</p> <p>Methods</p> <p>Subjects (n = 6) were seated on a chair and the right foot was strapped to a pedal attached to a torque meter. The isometric ankle torque was measured in response to different patterns of coupled electrical (20-Hz, rectangular 1-ms pulses) and mechanical stimuli (either 100-Hz sinusoid or gaussian white noise) applied to the triceps surae muscle group. In an additional investigation, M<sub>max </sub>and F-waves were elicited at different times before or after the vibratory stimulation.</p> <p>Results</p> <p>The vibratory bursts could generate substantial self-sustained extra torques, either with or without the background 20-Hz electrical stimulation applied simultaneously with the vibration. The extra torque generation was accompanied by increased motoneuron excitability, since an increase in the peak-to-peak amplitude of soleus F waves was observed. The delivery of electrical stimulation following the vibration was essential to keep the maintained extra torques and increased F-waves.</p> <p>Conclusions</p> <p>These results show that vibratory stimuli applied with a background electrical stimulation generate considerable force levels (up to about 50% MVC) due to the spinal recruitment of motoneurons. The association of vibration and electrical stimulation could be beneficial for many therapeutic interventions and vibration-based exercise programs. The command for the vibration-induced extra torques presumably activates spinal motoneurons following the size principle, which is a desirable feature for stimulation paradigms.</p
    • …
    corecore