600 research outputs found

    Cloud Computing for Next-Generation Sequencing Data Analysis

    Get PDF
    High-throughput next-generation sequencing (NGS) technologies have evolved rapidly and are reshaping the scope of genomics research. The substantial decrease in the cost of NGS techniques in the past decade has led to its rapid adoption in biological research and drug development. Genomics studies of large populations are producing a huge amount of data, giving rise to computational issues around the storage, transfer, and analysis of the data. Fortunately, cloud computing has recently emerged as a viable option to quickly and easily acquire the computational resources for large-scale NGS data analyses. Some cloud-based applications and resources have been developed specifically to address the computational challenges of working with very large volumes of data generated by NGS technology. In this chapter, we will review some cloud-based systems and solutions for NGS data analysis, discuss the practical hurdles and limitations in cloud computing, including data transfer and security, and share the lessons we learned from the implementation of Rainbow, a cloud-based tool for large-scale genome sequencing data analysis

    Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

    Get PDF
    The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we resent Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone

    Cloudflow – A Framework for MapReduce Pipeline Development in Biomedical Research

    Get PDF
    - The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we present Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone. The framework is open source and free available at https://github.com/genepi/cloudflow. Document type: Conference objec
    corecore