Search CORE

6,846 research outputs found

Recommended from our members

Percolation scheduling for non-VLIW machines

Author: Brownhill Carrie J.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 15/01/1990
Field of study

Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps

eScholarship - University of California

pocl: A Performance-Portable OpenCL Implementation

Author: Berg Heikki
de La Lama Carlos Sánchez
Jääskeläinen Pekka
Raiskila Kalle
Schnetter Erik
Takala Jarmo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via arxi

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

A Special Purpose Architecture for Finite Element Analysis

Author: Jordan H. F.
Publication venue
Publication date
Field of study

The analysis of aerospace structures by the finite element method consumes considerable computer time. The cost of this resource and the designer's desire to have rapid feedback concerning such questions as the effect of a change in loading of the structure or in a parameter of some structural material led to the design of a special purpose parallel computing system for finite element analysis. As a special purpose computer, the architecture of this finite element computer is closely tied to computational aspects of the particular problem. Various aspects of an MIMD array of microprocessors are related to the requirements of the class of finite element analysis problems which it is intended to solve

NASA Technical Reports Server

Beyond the Fokker-Planck equation: Pathwise control of noisy bistable systems

Author: (Berglund N
(Berglund N
Arnold L
Arnold L
Azencott R
Barbara Gentz
Bellman R
Benzi R
Benzi R
Berglund N
Berglund N
Berglund N
Berglund N
Crauel H
Crauel H
Day M V
Dhar D
Dykman M I
Eckmann J-P
Fleming W H
Freidlin M I
Gradstein I S
Hasegawa H
Hasselmann K
Hepp K
Jansons K M
Jung P
Kifer Y
Longtin A
Maier R S
Martin Ph A
Monahan A H
Moss F
Neiman A
Neishtadt A I
Neishtadt A I
Nils Berglund
Rahmstorf S
Risken H
SchmalfußB
Schütte Ch
Stommel H
Talkner P
Tihonov A N
Tuckwell H C
Wasow W
Wiesenfeld K
Publication venue: 'IOP Publishing'
Publication date: 01/01/2001
Field of study

We introduce a new method, allowing to describe slowly time-dependent Langevin equations through the behaviour of individual paths. This approach yields considerably more information than the computation of the probability density. The main idea is to show that for sufficiently small noise intensity and slow time dependence, the vast majority of paths remain in small space-time sets, typically in the neighbourhood of potential wells. The size of these sets often has a power-law dependence on the small parameters, with universal exponents. The overall probability of exceptional paths is exponentially small, with an exponent also showing power-law behaviour. The results cover time spans up to the maximal Kramers time of the system. We apply our method to three phenomena characteristic for bistable systems: stochastic resonance, dynamical hysteresis and bifurcation delay, where it yields precise bounds on transition probabilities, and the distribution of hysteresis areas and first-exit times. We also discuss the effect of coloured noise.Comment: 37 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL AMU

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Sequential escapes: onset of slow domino regime via a saddle connection

Author: Ashwin Peter
Creaser Jennifer
Tsaneva-Atanasova Krasimira
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/04/2018
Field of study

We explore sequential escape behaviour of coupled bistable systems under the influence of stochastic perturbations. We consider transient escapes from a marginally stable "quiescent" equilibrium to a more stable "active" equilibrium. The presence of coupling introduces dependence between the escape processes: for diffusive coupling there is a strongly coupled limit (fast domino regime) where the escapes are strongly synchronised while for intermediate coupling (slow domino regime) without partially escaped stable states, there is still a delayed effect. These regimes can be associated with bifurcations of equilibria in the low-noise limit. In this paper we consider a localized form of non-diffusive (i.e pulse-like) coupling and find similar changes in the distribution of escape times with coupling strength. However we find transition to a slow domino regime that is not associated with any bifurcations of equilibria. We show that this transition can be understood as a codimension-one saddle connection bifurcation for the low-noise limit. At transition, the most likely escape path from one attractor hits the escape saddle from the basin of another partially escaped attractor. After this bifurcation we find increasing coefficient of variation of the subsequent escape times

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Open Research Exeter

A Multi-GPU Programming Library for Real-Time Applications

Author: Schaetz Sebastian
Uecker Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

arXiv.org e-Print Archive

MPG.PuRe