91 research outputs found

    OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

    Full text link
    The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

    Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

    Get PDF
    Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang's CUDA compiler frequently outperform NVIDIA's nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA

    GPU-Based Data Processing for 2-D Microwave Imaging on MAST

    Get PDF
    The Synthetic Aperture Microwave Imaging (SAMI) diagnostic is a Mega Amp Spherical Tokamak (MAST) diagnostic based at Culham Centre for Fusion Energy. The acceleration of the SAMI diagnostic data-processing code by a graphics processing unit is presented, demonstrating acceleration of up to 60 times compared to the original IDL (Interactive Data Language) data-processing code. SAMI will now be capable of intershot processing allowing pseudo-real-time control so that adjustments and optimizations can be made between shots. Additionally, for the first time the analysis of many shots will be possible

    Accelerated large-scale multiple sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware.</p> <p>Results</p> <p>We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor.</p> <p>Conclusions</p> <p>Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from <url>http://dna.cs.byu.edu/msa/</url>.</p

    Vibration-induced extra torque during electrically-evoked contractions of the human calf muscles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-frequency trains of electrical stimulation applied over the lower limb muscles can generate forces higher than would be expected from a peripheral mechanism (i.e. by direct activation of motor axons). This phenomenon is presumably originated within the central nervous system by synaptic input from Ia afferents to motoneurons and is consistent with the development of plateau potentials. The first objective of this work was to investigate if vibration (sinusoidal or random) applied to the Achilles tendon is also able to generate large magnitude extra torques in the triceps surae muscle group. The second objective was to verify if the extra torques that were found were accompanied by increases in motoneuron excitability.</p> <p>Methods</p> <p>Subjects (n = 6) were seated on a chair and the right foot was strapped to a pedal attached to a torque meter. The isometric ankle torque was measured in response to different patterns of coupled electrical (20-Hz, rectangular 1-ms pulses) and mechanical stimuli (either 100-Hz sinusoid or gaussian white noise) applied to the triceps surae muscle group. In an additional investigation, M<sub>max </sub>and F-waves were elicited at different times before or after the vibratory stimulation.</p> <p>Results</p> <p>The vibratory bursts could generate substantial self-sustained extra torques, either with or without the background 20-Hz electrical stimulation applied simultaneously with the vibration. The extra torque generation was accompanied by increased motoneuron excitability, since an increase in the peak-to-peak amplitude of soleus F waves was observed. The delivery of electrical stimulation following the vibration was essential to keep the maintained extra torques and increased F-waves.</p> <p>Conclusions</p> <p>These results show that vibratory stimuli applied with a background electrical stimulation generate considerable force levels (up to about 50% MVC) due to the spinal recruitment of motoneurons. The association of vibration and electrical stimulation could be beneficial for many therapeutic interventions and vibration-based exercise programs. The command for the vibration-induced extra torques presumably activates spinal motoneurons following the size principle, which is a desirable feature for stimulation paradigms.</p

    Comparative evaluation of platforms for parallel Ant Colony Optimization

    Get PDF
    The rapidly growing field of nature-inspired computing concerns the development and application of algorithms and methods based on biological or physical principles. This approach is particularly compelling for practitioners in high-performance computing, as natural algorithms are often inherently parallel in nature (for example, they may be based on a “swarm”-like model that uses a population of agents to optimize a function). Coupled with rising interest in nature-based algorithms is the growth in heterogenous computing; systems that use more than one kind of processor. We are therefore interested in the performance characteristics of nature-inspired algorithms on a number of different platforms. To this end, we present a new OpenCL-based implementation of the Ant Colony Optimization algorithm, and use it as the basis of extensive experimental tests. We benchmark the algorithm against existing implementations, on a wide variety of hardware platforms, and offer extensive analysis. This work provides rigorous foundations for future investigations of Ant Colony Optimization on high-performance platforms

    Chemogenomic Analysis of G-Protein Coupled Receptors and Their Ligands Deciphers Locks and Keys Governing Diverse Aspects of Signalling

    Get PDF
    Understanding the molecular mechanism of signalling in the important super-family of G-protein-coupled receptors (GPCRs) is causally related to questions of how and where these receptors can be activated or inhibited. In this context, it is of great interest to unravel the common molecular features of GPCRs as well as those related to an active or inactive state or to subtype specific G-protein coupling. In our underlying chemogenomics study, we analyse for the first time the statistical link between the properties of G-protein-coupled receptors and GPCR ligands. The technique of mutual information (MI) is able to reveal statistical inter-dependence between variations in amino acid residues on the one hand and variations in ligand molecular descriptors on the other. Although this MI analysis uses novel information that differs from the results of known site-directed mutagenesis studies or published GPCR crystal structures, the method is capable of identifying the well-known common ligand binding region of GPCRs between the upper part of the seven transmembrane helices and the second extracellular loop. The analysis shows amino acid positions that are sensitive to either stimulating (agonistic) or inhibitory (antagonistic) ligand effects or both. It appears that amino acid positions for antagonistic and agonistic effects are both concentrated around the extracellular region, but selective agonistic effects are cumulated between transmembrane helices (TMHs) 2, 3, and ECL2, while selective residues for antagonistic effects are located at the top of helices 5 and 6. Above all, the MI analysis provides detailed indications about amino acids located in the transmembrane region of these receptors that determine G-protein signalling pathway preferences
    corecore