Search CORE

4 research outputs found

Recommended from our members

Computational Strategies for Scalable Genomics Analysis.

Author: Shi Lizhen
Wang Zhong
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications

eScholarship - University of California

A case study of tuning MapReduce for efficient Bioinformatics in the cloud

Author: Dean
Decap
Goecks
Heger
Herodotou
Hess
Hong
Joshi
Lama
Langmead
Langmead
Leo
Li
Liao
Lizhen Shi
Matsunaga
McKenna
Nguyen
Nordberg
Olston
Pireddu
Saha
Schatz
Schumacher
Shrinivas Joshi
TUNING
Vavilapalli
Weikuan Yu
Xiandong Meng
Yu
Zaharia
Zhang
Zhong Wang
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this study, we aim to minimize the cloud computing cost spent on bioinformatics data analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). This paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits

Crossref

eScholarship - University of California

A case study of tuning MapReduce for efficient Bioinformatics in the cloud

Author: Shi Lizhen,
Publication venue
Publication date: 30/05/2017
Field of study

Ezid

A case study of tuning MapReduce for efficient Bioinformatics in the cloud

Author: Dean
Decap
Goecks
Heger
Herodotou
Hess
Hong
Joshi
Lama
Langmead
Langmead
Leo
Li
Liao
Lizhen Shi
Matsunaga
McKenna
Nguyen
Nordberg
Olston
Pireddu
Saha
Schatz
Schumacher
Shrinivas Joshi
TUNING
Vavilapalli
Weikuan Yu
Xiandong Meng
Yu
Zaharia
Zhang
Zhong Wang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref