4,800 research outputs found
PPF - A Parallel Particle Filtering Library
We present the parallel particle filtering (PPF) software library, which
enables hybrid shared-memory/distributed-memory parallelization of particle
filtering (PF) algorithms combining the Message Passing Interface (MPI) with
multithreading for multi-level parallelism. The library is implemented in Java
and relies on OpenMPI's Java bindings for inter-process communication. It
includes dynamic load balancing, multi-thread balancing, and several
algorithmic improvements for PF, such as input-space domain decomposition. The
PPF library hides the difficulties of efficient parallel programming of PF
algorithms and provides application developers with the necessary tools for
parallel implementation of PF methods. We demonstrate the capabilities of the
PPF library using two distributed PF algorithms in two scenarios with different
numbers of particles. The PPF library runs a 38 million particle problem,
corresponding to more than 1.86 GB of particle data, on 192 cores with 67%
parallel efficiency. To the best of our knowledge, the PPF library is the first
open-source software that offers a parallel framework for PF applications.Comment: 8 pages, 8 figures; will appear in the proceedings of the IET Data
Fusion & Target Tracking Conference 201
Scalable multimodal convolutional networks for brain tumour segmentation
Brain tumour segmentation plays a key role in computer-assisted surgery. Deep
neural networks have increased the accuracy of automatic segmentation
significantly, however these models tend to generalise poorly to different
imaging modalities than those for which they have been designed, thereby
limiting their applications. For example, a network architecture initially
designed for brain parcellation of monomodal T1 MRI can not be easily
translated into an efficient tumour segmentation network that jointly utilises
T1, T1c, Flair and T2 MRI. To tackle this, we propose a novel scalable
multimodal deep learning architecture using new nested structures that
explicitly leverage deep features within or across modalities. This aims at
making the early layers of the architecture structured and sparse so that the
final architecture becomes scalable to the number of modalities. We evaluate
the scalable architecture for brain tumour segmentation and give evidence of
its regularisation effect compared to the conventional concatenation approach.Comment: Paper accepted at MICCAI 201
Complete methods set for scalable ion trap quantum information processing
Large-scale quantum information processors must be able to transport and
maintain quantum information, and repeatedly perform logical operations. Here
we demonstrate a combination of all the fundamental elements required to
perform scalable quantum computing using qubits stored in the internal states
of trapped atomic ions. We quantify the repeatability of a multi-qubit
operation, observing no loss of performance despite qubit transport over
macroscopic distances. Key to these results is the use of different pairs of
beryllium ion hyperfine states for robust qubit storage, readout and gates, and
simultaneous trapping of magnesium re-cooling ions along with the qubit ions.Comment: 9 pages, 4 figures. Accepted to Science, and thus subject to a press
embarg
TaskPoint: sampled simulation of task-based programs
Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to system noise and variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals we employ a novel fast-forward mechanism for dynamically scheduled programs. We evaluate the proposed technique on a set of 19 task-based parallel benchmarks and two different architectures. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 64 simulated threads by an average factor of 19.1 at an average error of 1.8% and a maximum error of 15.0%.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), the Spanish Ministry of Science and Innovation
(contract TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Mont-Blanc project (EU-FP7-610402 and EU-H2020-671697). M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas is supported by the Ministry of Economy
and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the EUFP7 (contract 2013BP B 00243). T.Grass has been partially
supported by the AGAUR of the Generalitat de Catalunya (grant 2013FI B 0058).Peer ReviewedPostprint (author's final draft
A Parallel Histogram-based Particle Filter for Object Tracking on SIMD-based Smart Cameras
We present a parallel implementation of a histogram-based particle filter for object tracking on smart cameras based on SIMD processors. We specifically focus on parallel computation of the particle weights and parallel construction of the feature histograms since these are the major bottlenecks in standard implementations of histogram-based particle filters. The proposed algorithm can be applied with any histogram-based feature sets—we show in detail how the parallel particle filter can employ simple color histograms as well as more complex histograms of oriented gradients (HOG). The algorithm was successfully implemented on an SIMD processor and performs robust object tracking at up to 30 frames per second—a performance difficult to achieve even on a modern desktop computer
- …