Search CORE

52 research outputs found

Exploiting Parallel R in the Cloud with SPRINT

Author: A. D. Lloyd
Altschul
Bolshakova
G. A. McGilvary
J. Hill
L. Mitchell
M. Mewissen
M. Piotrowski
P. Ghazal
Quackenbush
Schad
Schmidberger
T. Forster
T. M. Sloan
Walker
Xu
Yu
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2013
Field of study

BACKGROUND: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. OBJECTIVES: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. METHODS: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. RESULTS: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds

Crossref

Online Research @ Cardiff

PubMed Central

Edinburgh Research Explorer

SPRINT: A new parallel framework for R

Author: A Brazma
Arthur Trew
DDL Bowtell
Florian Scharinger
G Vera
GA Geist
H Schwender
H Xiong
J Quackenbush
Jon Hill
L Dagum
M Dunning
M Åstrand
Matthew Hambley
Message Passing Interface Forum
MJ Heller
Muriel Mewissen
Peter Ghazal
S Calza
Terence M Sloan
Thorsten Forster
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

PubMed Central

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

PEDRo: A database for storing, searching and disseminating experimental proteomics data

Author: Brass A
Brown AJP
Carroll K
Chater K
Evans C
Garwood C
Garwood K
Gaskell SJ
Ghazal P
Hansson L
Hart S
Hesketh A
Howard J
Hubbard SJ
Joens S
Lilley KS
McLaughlin T
Mewissen M
Morrison N
Oliver SG
Paton NW
Stead D
Taylor CF
Whetton AD
Yin Z
Publication venue: BMC GENOMICS
Publication date: 01/01/2004
Field of study

Abstract Background Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets. Results This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available. Conclusions The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation

Aberdeen University Research

Keele Research Repository

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

Apollo (Cambridge)

White Rose Research Online