Search CORE

685 research outputs found

Optimization of a parallel permutation testing function for the SPRINT R package

Author: Dobrzelecki Bartosz
Forster Thorsten
Mewissen Muriel
Petrou Savvas
Piotrowski Michal
Sloan Terence
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Optimization of a parallel permutation testing function for the SPRINT R package

Author: Dobrzelecki Bartosz
Forster Thorsten
Ghazal Peter
Hill Jon
Mewissen Muriel
Petrou Savvas
Piotrowski Michal
Sloan Terence
Trew Arthur
Publication venue: 'Wiley'
Publication date: 23/06/2011
Field of study

The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities

Crossref

Online Research @ Cardiff

PubMed Central

Edinburgh Research Explorer

Exploiting Parallel R in the Cloud with SPRINT

Author: A. D. Lloyd
Altschul
Bolshakova
G. A. McGilvary
J. Hill
L. Mitchell
M. Mewissen
M. Piotrowski
P. Ghazal
Quackenbush
Schad
Schmidberger
T. Forster
T. M. Sloan
Walker
Xu
Yu
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2013
Field of study

BACKGROUND: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. OBJECTIVES: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. METHODS: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. RESULTS: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds

Crossref

Online Research @ Cardiff

PubMed Central

Edinburgh Research Explorer

Recommended from our members

Bond-Order Time Series Analysis for Detecting Reaction Events in Ab Initio Molecular Dynamics Simulations.

Author: Hutchings Marshall
Liu Johnson
Qiu Yudong
Song Chenchen
Wang Lee-Ping
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Ab initio molecular dynamics is able to predict novel reaction mechanisms by directly observing the individual reaction events that occur in simulation trajectories. In this article, we describe an approach for detecting reaction events from simulation trajectories using a physically motivated model based on time series analysis of ab initio bond orders. We found that applying a threshold to the bond order was insufficient for accurate detection, whereas peak finding on the first time derivative resulted in significantly improved accuracy. The model is trained on a reference set of reaction events representing the ideal result given unlimited computing resources. Our study includes two model systems: a heptanylium carbocation that undergoes hydride shifts and an unsaturated iron carbonyl cluster that features CO ligand migration and bridging behavior. The results indicate a high level of promise for this analysis approach to be used in mechanistic analysis of reactive AIMD simulations more generally

eScholarship - University of California

Parallel Optimisation of Bootstrapping in R

Author: Forster T.
Ghazal P.
M. Sloan T.
Piotrowski M.
Publication venue
Publication date: 24/01/2014
Field of study

Bootstrapping is a popular and computationally demanding resampling method used for measuring the accuracy of sample estimates and assisting with statistical inference. R is a freely available language and environment for statistical computing popular with biostatisticians for genomic data analyses. A survey of such R users highlighted its implementation of bootstrapping as a prime candidate for parallelization to overcome computational bottlenecks. The Simple Parallel R Interface (SPRINT) is a package that allows R users to exploit high performance computing in multi-core desktops and supercomputers without expert knowledge of such systems. This paper describes the parallelization of bootstrapping for inclusion in the SPRINT R package. Depending on the complexity of the bootstrap statistic and the number of resamples, this implementation has close to optimal speed up on up to 16 nodes of a supercomputer and close to 100 on 512 nodes. This performance in a multi-node setting compares favourably with an existing parallelization option in the native R implementation of bootstrapping

arXiv.org e-Print Archive

Edinburgh Research Explorer

MO-ParamILS: A Multi-objective Automatic Algorithm Configuration Framework

Author: A Blot
A Liefooghe
B Adenso-Díaz
F Hutter
F Hutter
H Lourenço
M Birattari
M-E Marmion
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/05/2016
Field of study

International audienceAutomated algorithm configuration procedures play an increasingly important role in the development and application of algorithms for a wide range of computationally challenging problems. Until very recently, these configuration procedures were limited to optimising a single performance objective, such as the running time or solution quality achieved by the algorithm being configured. However, in many applications there is more than one performance objective of interest. This gives rise to the multi-objective automatic algorithm configuration problem, which involves finding a Pareto set of configurations of a given target algorithm that characterises trade-offs between multiple performance objectives. In this work, we introduce MO-ParamILS, a multi-objective extension of the state-of-the-art single-objective algorithm configuration framework ParamILS, and demonstrate that it produces good results on several challenging bi-objective algorithm configuration scenarios compared to a base-line obtained from using a state-of-the-art single-objective algorithm configurator

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

SPRINT: more runners, fewer hurdles

Author: Robertson Kevin
Publication venue
Publication date: 01/10/2012
Field of study

Edinburgh Research Explorer

Scalable Data Parallel Algorithms for Texture Synthesis and Compression using Gibbs Random Fields

Author: Bader David A.
Chellappa Rama
JaJa Joseph
Publication venue
Publication date: 15/10/1998
Field of study

This paper introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov Random Field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. Use of fine-grained, data parallel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented here enables machine independent scalable algorithms for a number of problems in image processing and analysis. (Also cross-referenced as UMIACS-TR-93-80.

Digital Repository at the University of Maryland

Does morphology matter? Unravelling the evolutionary significance of morphological variation in Podarcis wall lizards

Author: Verónica Alexandra Seixas Gomes
Publication venue
Publication date: 11/12/2018
Field of study

Repositório Aberto da Universidade do Porto