2,124 research outputs found
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
In this paper, we discuss software design issues related to the development
of parallel computational intelligence algorithms on multi-core CPUs, using the
new Java 8 functional programming features. In particular, we focus on
probabilistic graphical models (PGMs) and present the parallelisation of a
collection of algorithms that deal with inference and learning of PGMs from
data. Namely, maximum likelihood estimation, importance sampling, and greedy
search for solving combinatorial optimisation problems. Through these concrete
examples, we tackle the problem of defining efficient data structures for PGMs
and parallel processing of same-size batches of data sets using Java 8
features. We also provide straightforward techniques to code parallel
algorithms that seamlessly exploit multi-core processors. The experimental
analysis, carried out using our open source AMIDST (Analysis of MassIve Data
STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on
Computational Intelligence Software at IEEE Computational Intelligence
Magazine journa
Efficient, concurrent Bayesian analysis of full waveform LaDAR data
Bayesian analysis of full waveform laser detection and ranging (LaDAR)
signals using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms
have shown higher estimation accuracy, resolution and sensitivity to
detect weak signatures for 3D surface profiling, and construct multiple layer
images with varying number of surface returns. However, it is computational
expensive. Although parallel computing has the potential to reduce both the
processing time and the requirement for persistent memory storage, parallelizing
the serial sampling procedure in RJMCMC is a significant challenge
in both statistical and computing domains. While several strategies have been
developed for Markov chain Monte Carlo (MCMC) parallelization, these are
usually restricted to fixed dimensional parameter estimates, and not obviously
applicable to RJMCMC for varying dimensional signal analysis.
In the statistical domain, we propose an effective, concurrent RJMCMC algorithm,
state space decomposition RJMCMC (SSD-RJMCMC), which divides
the entire state space into groups and assign to each an independent
RJMCMC chain with restricted variation of model dimensions. It intrinsically
has a parallel structure, a form of model-level parallelization. Applying
the convergence diagnostic, we can adaptively assess the convergence of the
Markov chain on-the-fly and so dynamically terminate the chain generation.
Evaluations on both synthetic and real data demonstrate that the concurrent
chains have shorter convergence length and hence improved sampling efficiency.
Parallel exploration of the candidate models, in conjunction with an
error detection and correction scheme, improves the reliability of surface detection.
By adaptively generating a complimentary MCMC sequence for the
determined model, it enhances the accuracy for surface profiling.
In the computing domain, we develop a data parallel SSD-RJMCMC (DP
SSD-RJMCMCU) to achieve efficient parallel implementation on a distributed
computer cluster. Adding data-level parallelization on top of the model-level
parallelization, it formalizes a task queue and introduces an automatic scheduler
for dynamic task allocation. These two strategies successfully diminish
the load imbalance that occurred in SSD-RJMCMC. Thanks to the coarse
granularity, the processors communicate at a very low frequency. The MPIbased
implementation on a Beowulf cluster demonstrates that compared with
RJMCMC, DP SSD-RJMCMCU has further reduced problem size and computation
complexity. Therefore, it can achieve a super linear speedup if the
number of data segments and processors are chosen wisely
MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes
The advent of multi-/many-core processors in clusters advocates hybrid
parallel programming, which combines Message Passing Interface (MPI) for
inter-node parallelism with a shared memory model for on-node parallelism.
Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but
promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI)
is gaining attraction. We describe an algorithmic approach for collective
operations (with allgather and broadcast as concrete examples) in the context
of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With
this approach, only one memory copy is maintained and shared by on-node
processes. This allows the removal of unnecessary on-node copies of replicated
data that are required between MPI processes when the collectives are invoked
in the context of pure MPI. We compare our approach of collectives for hybrid
MPI+MPI and the traditional one for pure MPI, and also have a discussion on the
synchronization that is required to guarantee data integrity. The performance
of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC
cluster (OpenMPI), showing that it achieves comparable or better performance
for allgather operations. We have further validated our approach with a
standard computational kernel, namely distributed matrix multiplication, and a
Bayesian Probabilistic Matrix Factorization code.Comment: 10 pages. Accepted for publication in ICPP Workshops 201
- …