2,124 research outputs found

    Probabilistic Graphical Models on Multi-Core CPUs using Java 8

    Get PDF
    In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on Computational Intelligence Software at IEEE Computational Intelligence Magazine journa

    Efficient, concurrent Bayesian analysis of full waveform LaDAR data

    Get PDF
    Bayesian analysis of full waveform laser detection and ranging (LaDAR) signals using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms have shown higher estimation accuracy, resolution and sensitivity to detect weak signatures for 3D surface profiling, and construct multiple layer images with varying number of surface returns. However, it is computational expensive. Although parallel computing has the potential to reduce both the processing time and the requirement for persistent memory storage, parallelizing the serial sampling procedure in RJMCMC is a significant challenge in both statistical and computing domains. While several strategies have been developed for Markov chain Monte Carlo (MCMC) parallelization, these are usually restricted to fixed dimensional parameter estimates, and not obviously applicable to RJMCMC for varying dimensional signal analysis. In the statistical domain, we propose an effective, concurrent RJMCMC algorithm, state space decomposition RJMCMC (SSD-RJMCMC), which divides the entire state space into groups and assign to each an independent RJMCMC chain with restricted variation of model dimensions. It intrinsically has a parallel structure, a form of model-level parallelization. Applying the convergence diagnostic, we can adaptively assess the convergence of the Markov chain on-the-fly and so dynamically terminate the chain generation. Evaluations on both synthetic and real data demonstrate that the concurrent chains have shorter convergence length and hence improved sampling efficiency. Parallel exploration of the candidate models, in conjunction with an error detection and correction scheme, improves the reliability of surface detection. By adaptively generating a complimentary MCMC sequence for the determined model, it enhances the accuracy for surface profiling. In the computing domain, we develop a data parallel SSD-RJMCMC (DP SSD-RJMCMCU) to achieve efficient parallel implementation on a distributed computer cluster. Adding data-level parallelization on top of the model-level parallelization, it formalizes a task queue and introduces an automatic scheduler for dynamic task allocation. These two strategies successfully diminish the load imbalance that occurred in SSD-RJMCMC. Thanks to the coarse granularity, the processors communicate at a very low frequency. The MPIbased implementation on a Beowulf cluster demonstrates that compared with RJMCMC, DP SSD-RJMCMCU has further reduced problem size and computation complexity. Therefore, it can achieve a super linear speedup if the number of data segments and processors are chosen wisely

    MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

    Full text link
    The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI) is gaining attraction. We describe an algorithmic approach for collective operations (with allgather and broadcast as concrete examples) in the context of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With this approach, only one memory copy is maintained and shared by on-node processes. This allows the removal of unnecessary on-node copies of replicated data that are required between MPI processes when the collectives are invoked in the context of pure MPI. We compare our approach of collectives for hybrid MPI+MPI and the traditional one for pure MPI, and also have a discussion on the synchronization that is required to guarantee data integrity. The performance of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC cluster (OpenMPI), showing that it achieves comparable or better performance for allgather operations. We have further validated our approach with a standard computational kernel, namely distributed matrix multiplication, and a Bayesian Probabilistic Matrix Factorization code.Comment: 10 pages. Accepted for publication in ICPP Workshops 201
    • …
    corecore