471 research outputs found
OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF
The actor model of computation has been designed for a seamless support of
concurrency and distribution. However, it remains unspecific about data
parallel program flows, while available processing power of modern many core
hardware such as graphics processing units (GPUs) or coprocessors increases the
relevance of data parallelism for general-purpose computation.
In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework
(CAF). This offers a high level interface for accessing any OpenCL device
without leaving the actor paradigm. The new type of actor is integrated into
the runtime environment of CAF and gives rise to transparent message passing in
distributed systems on heterogeneous hardware. Following the actor logic in
CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence
operate in a multi-stage fashion on data resident at the GPU. Developers are
thus enabled to build complex data parallel programs from primitives without
leaving the actor paradigm, nor sacrificing performance. Our evaluations on
commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear
scaling behavior when offloading larger workloads. For sub-second duties, the
efficiency of offloading was found to largely differ between devices. Moreover,
our findings indicate a negligible overhead over programming with the native
OpenCL API.Comment: 28 page
Optimization of Convolutional Neural Network ensemble classifiers by Genetic Algorithms
Breast cancer exhibits a high mortality rate and it is the most invasive cancer in women. An analysis from histopathological images could predict this disease. In this way, computational image processing might support this task. In this work a proposal which employes deep learning convolutional neural networks is presented. Then, an ensemble of networks is considered in order to obtain an enhanced recognition performance of the system by the consensus of the networks of the ensemble. Finally, a genetic algorithm is also considered to choose the networks that belong to the ensemble. The proposal has been tested by carrying out several experiments with a set of benchmark images.Universidad de MĂĄlaga. Campus de Excelencia Internacional AndalucĂa Tech
Student teamwork: developing virtual support for team projects
In the 21st century team working increasingly requires online cooperative skills as well as more traditional skills associated with face to face team working. Virtual team working differs from face to face team working in a number of respects, such as interpreting the alternatives to visual cues, adapting to synchronous communication, developing trust and cohesion and cultural interpretations. However, co-located student teams working within higher education can only simulate team working as it might be experienced in organisations today. For example, students can learn from their mistakes in a non-threatening environment, colleagues tend to be established friends and assessing teamwork encourages behaviour such as âfree-ridingâ. Using a prototyping approach, which involves students and tutors, a system has been designed to support learners engaged in team working. This system helps students to achieve to their full potential and appreciate issues surrounding virtual teamwork. The Guardian Agent system enables teams to allocate project tasks and agree ground rules for the team according to individualsâ preferences. Results from four cycles of its use are presented, together with modifications arising from iterations of testing. The results show that students find the system useful in preparing for team working, and have encouraged further development of the system
Collaborative Layer-wise Discriminative Learning in Deep Neural Networks
Intermediate features at different layers of a deep neural network are known
to be discriminative for visual patterns of different complexities. However,
most existing works ignore such cross-layer heterogeneities when classifying
samples of different complexities. For example, if a training sample has
already been correctly classified at a specific layer with high confidence, we
argue that it is unnecessary to enforce rest layers to classify this sample
correctly and a better strategy is to encourage those layers to focus on other
samples.
In this paper, we propose a layer-wise discriminative learning method to
enhance the discriminative capability of a deep network by allowing its layers
to work collaboratively for classification. Towards this target, we introduce
multiple classifiers on top of multiple layers. Each classifier not only tries
to correctly classify the features from its input layer, but also coordinates
with other classifiers to jointly maximize the final classification
performance. Guided by the other companion classifiers, each classifier learns
to concentrate on certain training examples and boosts the overall performance.
Allowing for end-to-end training, our method can be conveniently embedded into
state-of-the-art deep networks. Experiments with multiple popular deep
networks, including Network in Network, GoogLeNet and VGGNet, on scale-various
object classification benchmarks, including CIFAR100, MNIST and ImageNet, and
scene classification benchmarks, including MIT67, SUN397 and Places205,
demonstrate the effectiveness of our method. In addition, we also analyze the
relationship between the proposed method and classical conditional random
fields models.Comment: To appear in ECCV 2016. Maybe subject to minor changes before
camera-ready versio
Wall Orientation and Shear Stress in the Lattice Boltzmann Model
The wall shear stress is a quantity of profound importance for clinical
diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable
numerical method of solving the flow problems, but it suffers from errors of
the velocity field near the boundaries which leads to errors in the wall shear
stress and normal vectors computed from the velocity. In this work we present a
simple formula to calculate the wall shear stress in the lattice Boltzmann
model and propose to compute wall normals, which are necessary to compute the
wall shear stress, by taking the weighted mean over boundary facets lying in a
vicinity of a wall element. We carry out several tests and observe an increase
of accuracy of computed normal vectors over other methods in two and three
dimensions. Using the scheme we compute the wall shear stress in an inclined
and bent channel fluid flow and show a minor influence of the normal on the
numerical error, implying that that the main error arises due to a corrupted
velocity field near the staircase boundary. Finally, we calculate the wall
shear stress in the human abdominal aorta in steady conditions using our method
and compare the results with a standard finite volume solver and experimental
data available in the literature. Applications of our ideas in a simplified
protocol for data preprocessing in medical applications are discussed.Comment: 9 pages, 11 figure
Optimistic Parallelism on GPUs
Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregulari-ties that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, com-putation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data paral-lelism, the latter three phases represent overhead costs of using specu-lation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our program-ming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.
Randomised trials conducted using cohorts : a scoping review
Acknowledgements We thank Margaret Sampson, MLIS, PhD, AHIP (Childrenâs Hospital of Eastern Ontario, Ottawa, Canada) for developing the search strategies on behalf of the CONSORT team. We thank Dr Philippa Fibert, St Maryâs University, Twickenham, London for her help in screening publications for inclusion.Peer reviewe
An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm
Recently, a fully implicit, energy- and charge-conserving particle-in-cell
method has been proposed for multi-scale, full-f kinetic simulations [G. Chen,
et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free
Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss
of numerical stability or accuracy. A fundamental feature of the method is the
segregation of particle-orbit computations from the field solver, while
remaining fully self-consistent. This paper describes a very efficient,
mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm
exploiting this feature. The JFNK solver is kept on the CPU in double precision
(DP), while the implicit, charge-conserving, and adaptive particle mover is
implemented on a GPU (graphics processing unit) using CUDA in single-precision
(SP). Performance-oriented optimizations are introduced with the aid of the
roofline model. The implicit particle mover algorithm is shown to achieve up to
400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU
efficiency against the peak theoretical performance, and is about 300 times
faster than an equivalent serial CPU (Intel Xeon X5460) execution. For the test
case chosen, the mixed-precision hybrid CPU-GPU solver is shown to over-perform
the DP CPU-only serial version by a factor of \sim 100, without apparent loss
of robustness or accuracy in a challenging long-timescale ion acoustic wave
simulation.Comment: 25 pages, 6 figures, submitted to J. Comput. Phy
- âŠ