Search CORE

471 research outputs found

OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

Author: A Klöckner
D Charousset
G Agha
G Agha
J Nickolls
JD Owens
K Wu
L Dagum
S Srinivasan
S Wienke
T Desell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

arXiv.org e-Print Archive

Crossref

REPOSIT HAW Hamburg

Optimization of Convolutional Neural Network ensemble classifiers by Genetic Algorithms

Author: A Krizhevsky
A Krizhevsky
A Vishnuvarthanan
FA Spanhol
J Nickolls
MA Molina-Cabello
MA Molina-Cabello
OK Bagui
PA Nogueira
R Davis
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2019
Field of study

Breast cancer exhibits a high mortality rate and it is the most invasive cancer in women. An analysis from histopathological images could predict this disease. In this way, computational image processing might support this task. In this work a proposal which employes deep learning convolutional neural networks is presented. Then, an ensemble of networks is considered in order to obtain an enhanced recognition performance of the system by the consensus of the networks of the ensemble. Finally, a genetic algorithm is also considered to choose the networks that belong to the ensemble. The proposal has been tested by carrying out several experiments with a set of benchmark images.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

Student teamwork: developing virtual support for team projects

Author: Bohemia E.
Felder R.
Gibbs G.
Janice Whatley
Jaques D.
Jessup L. M.
Lipnack J.
Millis B.
Nickolls F.
Sheppard K.
Tiwari
Whatley J.
Wooldridge M. J. N
Publication venue: Emerald Group Publishing Limited
Publication date: 01/05/2006
Field of study

In the 21st century team working increasingly requires online cooperative skills as well as more traditional skills associated with face to face team working. Virtual team working differs from face to face team working in a number of respects, such as interpreting the alternatives to visual cues, adapting to synchronous communication, developing trust and cohesion and cultural interpretations. However, co-located student teams working within higher education can only simulate team working as it might be experienced in organisations today. For example, students can learn from their mistakes in a non-threatening environment, colleagues tend to be established friends and assessing teamwork encourages behaviour such as “free-riding”. Using a prototyping approach, which involves students and tutors, a system has been designed to support learners engaged in team working. This system helps students to achieve to their full potential and appreciate issues surrounding virtual teamwork. The Guardian Agent system enables teams to allocate project tasks and agree ground rules for the team according to individuals’ preferences. Results from four cycles of its use are presented, together with modifications arising from iterations of testing. The results show that students find the system useful in preparing for team working, and have encouraged further development of the system

University of Salford Institutional Repository

Crossref

Collaborative Layer-wise Discriminative Learning in Deep Neural Networks

Author: C Cortes
C Farabet
C Xu
DR Cox
GE Hinton
J Nickolls
MD Zeiler
N Srivastava
O Russakovsky
P Viola
RO Duda
SS Haykin
Y Bengio
Y Wei
Publication venue
Publication date: 19/07/2016
Field of study

Intermediate features at different layers of a deep neural network are known to be discriminative for visual patterns of different complexities. However, most existing works ignore such cross-layer heterogeneities when classifying samples of different complexities. For example, if a training sample has already been correctly classified at a specific layer with high confidence, we argue that it is unnecessary to enforce rest layers to classify this sample correctly and a better strategy is to encourage those layers to focus on other samples. In this paper, we propose a layer-wise discriminative learning method to enhance the discriminative capability of a deep network by allowing its layers to work collaboratively for classification. Towards this target, we introduce multiple classifiers on top of multiple layers. Each classifier not only tries to correctly classify the features from its input layer, but also coordinates with other classifiers to jointly maximize the final classification performance. Guided by the other companion classifiers, each classifier learns to concentrate on certain training examples and boosts the overall performance. Allowing for end-to-end training, our method can be conveniently embedded into state-of-the-art deep networks. Experiments with multiple popular deep networks, including Network in Network, GoogLeNet and VGGNet, on scale-various object classification benchmarks, including CIFAR100, MNIST and ImageNet, and scene classification benchmarks, including MIT67, SUN397 and Places205, demonstrate the effectiveness of our method. In addition, we also analyze the relationship between the proposed method and classical conditional random fields models.Comment: To appear in ECCV 2016. Maybe subject to minor changes before camera-ready versio

arXiv.org e-Print Archive

Crossref

Wall Orientation and Shear Stress in the Lattice Boltzmann Model

Author: Aidun
Anor
Artoli
Axner
Berger
Bernsdorf
Bonert
Boussel
Bouzidi
Boyd
Carallo
Caro
Cebral
Cecchi
Chopard
Christ
Cosgrove
Dai
d’Humieres
Gohil
Harvey
He
He
Higuera
Jou
Krüger
Krüger
Ku
Ku
Luo
Maciej Matyka
Malek
McNamara
Meyerson
Milner
Nesbitt
Nickolls
Pantos
Papaioannou
Paszkowiak
Patankar
Peng
Peters
Pohl
Pontrelli
Quarteroni
Reneman
Shaaban
Stahl
Tse
Tölke
Xian
Zbigniew Koza
Łukasz Mirosław
Publication venue: 'Elsevier BV'
Publication date: 14/03/2012
Field of study

The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.Comment: 9 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Fast Parallel Suffix Array on the GPU

Author: E Lindholm
J Nickolls
JA Edwards
NJ Larsson
V Osipov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2015
Field of study

Crossref

eScholarship - University of California

A massively parallel GPU-accelerated model for analysis of fully nonlinear free surface waves

Author: Axelsson
Bingham
Brodtkorb
Cai
Demmel
Engsig-Karup
Engsig-Karup
Fructus
Göddeke
Hillis
Hwu
Li
McKee
Nickolls
Owens
Panchang
Patterson
Rienecker
Svendsen
Thompson
Trottenberg
Wallin
Wong
Young
Zakharov
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

Crossref

Online Research Database In Technology

Optimistic Parallelism on GPUs

Author: E Ayguadé
J Nickolls
JE Stone
L Dagum
S Liu
S Wienke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2015
Field of study

Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregulari-ties that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, com-putation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data paral-lelism, the latter three phases represent overhead costs of using specu-lation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our program-ming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.

CiteSeerX

Crossref

Randomised trials conducted using cohorts : a scoping review

Author: Eldridge Sandra
Griffin Xavier Luke
Hemkens Lars
Maguire Jonathon L
McCall Stephen J
McCord Kimberly A
Nickolls Beverley Jane
Relton Clare
Sohanpal Ratna
Verkooijen Helena M
Zwarenstein Merrick
Publication venue
Publication date: 08/03/2024
Field of study

Acknowledgements We thank Margaret Sampson, MLIS, PhD, AHIP (Children’s Hospital of Eastern Ontario, Ottawa, Canada) for developing the search strategies on behalf of the CONSORT team. We thank Dr Philippa Fibert, St Mary’s University, Twickenham, London for her help in screening publications for inclusion.Peer reviewe

Aberdeen University Research

An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

Author: Birdsall
Bowers
Bowers
Bowers
Burau
Chen
D.C. Barnes
Decyk
Fuller
G. Chen
Harris
Hennessy
Hwu
Hwu
Kirk
Kong
L. Chacón
Liewer
Little
Madduri
Markstein
Nickolls
Shampine
Stantchev
Williams
Wittenbrink
Wulf
Publication venue: 'Elsevier BV'
Publication date: 22/11/2011
Field of study

Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle-orbit computations from the field solver, while remaining fully self-consistent. This paper describes a very efficient, mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm exploiting this feature. The JFNK solver is kept on the CPU in double precision (DP), while the implicit, charge-conserving, and adaptive particle mover is implemented on a GPU (graphics processing unit) using CUDA in single-precision (SP). Performance-oriented optimizations are introduced with the aid of the roofline model. The implicit particle mover algorithm is shown to achieve up to 400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU efficiency against the peak theoretical performance, and is about 300 times faster than an equivalent serial CPU (Intel Xeon X5460) execution. For the test case chosen, the mixed-precision hybrid CPU-GPU solver is shown to over-perform the DP CPU-only serial version by a factor of \sim 100, without apparent loss of robustness or accuracy in a challenging long-timescale ion acoustic wave simulation.Comment: 25 pages, 6 figures, submitted to J. Comput. Phy

arXiv.org e-Print Archive

CiteSeerX

Crossref