685 research outputs found

    Optimization of a parallel permutation testing function for the SPRINT R package

    Get PDF
    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities

    Exploiting Parallel R in the Cloud with SPRINT

    Get PDF
    BACKGROUND: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. OBJECTIVES: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. METHODS: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. RESULTS: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds

    Parallel Optimisation of Bootstrapping in R

    Get PDF
    Bootstrapping is a popular and computationally demanding resampling method used for measuring the accuracy of sample estimates and assisting with statistical inference. R is a freely available language and environment for statistical computing popular with biostatisticians for genomic data analyses. A survey of such R users highlighted its implementation of bootstrapping as a prime candidate for parallelization to overcome computational bottlenecks. The Simple Parallel R Interface (SPRINT) is a package that allows R users to exploit high performance computing in multi-core desktops and supercomputers without expert knowledge of such systems. This paper describes the parallelization of bootstrapping for inclusion in the SPRINT R package. Depending on the complexity of the bootstrap statistic and the number of resamples, this implementation has close to optimal speed up on up to 16 nodes of a supercomputer and close to 100 on 512 nodes. This performance in a multi-node setting compares favourably with an existing parallelization option in the native R implementation of bootstrapping

    MO-ParamILS: A Multi-objective Automatic Algorithm Configuration Framework

    Get PDF
    International audienceAutomated algorithm configuration procedures play an increasingly important role in the development and application of algorithms for a wide range of computationally challenging problems. Until very recently, these configuration procedures were limited to optimising a single performance objective, such as the running time or solution quality achieved by the algorithm being configured. However, in many applications there is more than one performance objective of interest. This gives rise to the multi-objective automatic algorithm configuration problem, which involves finding a Pareto set of configurations of a given target algorithm that characterises trade-offs between multiple performance objectives. In this work, we introduce MO-ParamILS, a multi-objective extension of the state-of-the-art single-objective algorithm configuration framework ParamILS, and demonstrate that it produces good results on several challenging bi-objective algorithm configuration scenarios compared to a base-line obtained from using a state-of-the-art single-objective algorithm configurator

    SPRINT: more runners, fewer hurdles

    Get PDF

    Scalable Data Parallel Algorithms for Texture Synthesis and Compression using Gibbs Random Fields

    Get PDF
    This paper introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov Random Field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. Use of fine-grained, data parallel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented here enables machine independent scalable algorithms for a number of problems in image processing and analysis. (Also cross-referenced as UMIACS-TR-93-80.
    corecore