2,227 research outputs found
CosmoHammer: Cosmological parameter estimation with the MCMC Hammer
We study the benefits and limits of parallelised Markov chain Monte Carlo
(MCMC) sampling in cosmology. MCMC methods are widely used for the estimation
of cosmological parameters from a given set of observations and are typically
based on the Metropolis-Hastings algorithm. Some of the required calculations
can however be computationally intensive, meaning that a single long chain can
take several hours or days to calculate. In practice, this can be limiting,
since the MCMC process needs to be performed many times to test the impact of
possible systematics and to understand the robustness of the measurements being
made. To achieve greater speed through parallelisation, MCMC algorithms need to
have short auto-correlation times and minimal overheads caused by tuning and
burn-in. The resulting scalability is hence influenced by two factors, the MCMC
overheads and the parallelisation costs. In order to efficiently distribute the
MCMC sampling over thousands of cores on modern cloud computing infrastructure,
we developed a Python framework called CosmoHammer which embeds emcee, an
implementation by Foreman-Mackey et al. (2012) of the affine invariant ensemble
sampler by Goodman and Weare (2010). We test the performance of CosmoHammer for
cosmological parameter estimation from cosmic microwave background data. While
Metropolis-Hastings is dominated by overheads, CosmoHammer is able to
accelerate the sampling process from a wall time of 30 hours on a dual core
notebook to 16 minutes by scaling out to 2048 cores. Such short wall times for
complex data sets opens possibilities for extensive model testing and control
of systematics.Comment: Published version. 17 pages, 6 figures. The code is available at
http://www.astro.ethz.ch/refregier/research/Software/cosmohamme
SAPPORO: A way to turn your graphics cards into a GRAPE-6
We present Sapporo, a library for performing high-precision gravitational
N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library
mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can
switch to Sapporo by a simple relinking of the library. The precision of our
library is comparable to that of GRAPE-6, even though internally the GPU
hardware is limited to single precision arithmetics. This limitation is
effectively overcome by emulating double precision for calculating the distance
between particles. The performance loss of this operation is small (< 20%)
compared to the advantage of being able to run at high precision. We tested the
library using several GRAPE-6-enabled N-body codes, in particular with Starlab
and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6
particles on a PC with four commercial G92 architecture GPUs (two GeForce
9800GX2). As a production test, we simulated a 32k Plummer model with equal
mass stars well beyond core collapse. The simulation took 41 days, during which
the mean performance was 113 Gflop/s. The GPU did not show any problems from
running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom
Porting Decision Tree Algorithms to Multicore using FastFlow
The whole computer hardware industry embraced multicores. For these machines,
the extreme optimisation of sequential algorithms is no longer sufficient to
squeeze the real machine power, which can be only exploited via thread-level
parallelism. Decision tree algorithms exhibit natural concurrency that makes
them suitable to be parallelised. This paper presents an approach for
easy-yet-efficient porting of an implementation of the C4.5 algorithm on
multicores. The parallel porting requires minimal changes to the original
sequential code, and it is able to exploit up to 7X speedup on an Intel
dual-quad core machine.Comment: 18 pages + cove
Array languages and the N-body problem
This paper is a description of the contributions to the SICSA multicore challenge on many body
planetary simulation made by a compiler group at the University of Glasgow. Our group is part of
the Computer Vision and Graphics research group and we have for some years been developing array
compilers because we think these are a good tool both for expressing graphics algorithms and for
exploiting the parallelism that computer vision applications require.
We shall describe experiments using two languages on two different platforms and we shall compare
the performance of these with reference C implementations running on the same platforms. Finally
we shall draw conclusions both about the viability of the array language approach as compared to
other approaches used in the challenge and also about the strengths and weaknesses of the two, very
different, processor architectures we used
A randomised primal-dual algorithm for distributed radio-interferometric imaging
Next generation radio telescopes, like the Square Kilometre Array, will
acquire an unprecedented amount of data for radio astronomy. The development of
fast, parallelisable or distributed algorithms for handling such large-scale
data sets is of prime importance. Motivated by this, we investigate herein a
convex optimisation algorithmic structure, based on primal-dual
forward-backward iterations, for solving the radio interferometric imaging
problem. It can encompass any convex prior of interest. It allows for the
distributed processing of the measured data and introduces further flexibility
by employing a probabilistic approach for the selection of the data blocks used
at a given iteration. We study the reconstruction performance with respect to
the data distribution and we propose the use of nonuniform probabilities for
the randomised updates. Our simulations show the feasibility of the
randomisation given a limited computing infrastructure as well as important
computational advantages when compared to state-of-the-art algorithmic
structures.Comment: 5 pages, 3 figures, Proceedings of the European Signal Processing
Conference (EUSIPCO) 2016, Related journal publication available at
https://arxiv.org/abs/1601.0402
- …