Search CORE

8 research outputs found

Optimization of a parallel permutation testing function for the SPRINT R package

Author: Dobrzelecki Bartosz
Forster Thorsten
Mewissen Muriel
Petrou Savvas
Piotrowski Michal
Sloan Terence
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Optimization of a parallel permutation testing function for the SPRINT R package

Author: Dobrzelecki Bartosz
Forster Thorsten
Ghazal Peter
Hill Jon
Mewissen Muriel
Petrou Savvas
Piotrowski Michal
Sloan Terence
Trew Arthur
Publication venue: 'Wiley'
Publication date: 23/06/2011
Field of study

The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities

Crossref

Online Research @ Cardiff

PubMed Central

Edinburgh Research Explorer

OGSA-DAI 3.0 – The Whats and the Whys

Author: Antonioletti Mario
Atkinson Malcolm
Chue Hong Neil
Dobrzelecki Bartosz
Hume Alastair
Illingworth Malcolm
Jackson Michael
Karasavvas Kostas
Krause Amy
McDonnell Nicola
Parsons Mark
Schopf Jennifer M
Theocharopoulos Elias
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Multi-factorial analysis of class prediction error:estimating optimal number of biomarkers for various classification rules

Author: Bachmann Till T
Campbell Colin J
Ciani Ilenia
Crain Jason
Dickinson Paul
Dobrzelecki Bartosz
Ember Stuart W J
Ghazal Peter
Giraud Gerard
Grant Eilidh
Khondoker Mizanur
McDonnell Nicola
Mewissen Muriel
Mount Andrew R
Ross Alan J
Schulze Holger
Terry Jonathan G
Tlili Chaker
Walton Anthony J
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2010
Field of study

Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network ()

Crossref

Online Research @ Cardiff

UCL Discovery

King's Research Portal

University of East Anglia digital repository

MULTI-FACTORIAL ANALYSIS OF CLASS PREDICTION ERROR: ESTIMATING OPTIMAL NUMBER OF BIOMARKERS FOR VARIOUS CLASSIFICATION RULES

Author: ALAN J. ROSS
ANDREW R. MOUNT
ANTHONY J. WALTON
BARTOSZ DOBRZELECKI
CHAKER TLILI
COLIN J. CAMPBELL
Cortes C.
Dasarathy B. V.
Duda R. O.
Díaz-Uriarte R.
EILIDH GRANT
GERARD GIRAUD
HOLGER SCHULZE
ILENIA CIANI
JASON CRAIN
JONATHAN G. TERRY
MIZANUR R. KHONDOKER
MURIEL MEWISSEN
NICOLA McDONNELL
PAUL DICKINSON
PETER GHAZAL
Shakhnarovich G.
Smyth G. K.
STUART W. J. EMBER
Thomas R. S.
TILL T. BACHMANN
Vapnik V. N.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref