3 research outputs found

    Searching for SNPs with cloud computing

    Get PDF
    Novel software utilizing cloud computing technology to cost-effectively align and map SNPs from a human genome in three

    Parallel methods for short read assembly

    Get PDF
    This work is on the parallel de novo assembly of genomic sequences from short sequence reads. With short reads eliminating the reliability of read overlaps in predicting genomic co-location, a revival of graph-based methods has underpinned the development of short-read assemblers. While these methods predate short read technology, their reach has not extended significantly beyond bacterial genomes due to the memory resources required in their use. These memory limitations are exacerbated by the high coverage needed to compensate for shorter read lengths. As a result, prior to our work, short-read de novo assembly had been demonstrated on relatively small genome sizes with a few million bases. In our work, we advance the field of short sequence assembly in a number of ways. First, we extend models and ideas proposed and tested with small genomes on serial machines to large-scale distributed memory parallel machines. Second, we present ideas for assembly that are especially suited to the reconstruction of very large genomes on these machines. Additionally, we present the first assembler that specifically takes advantage a variable number of fragment sizes or insert lengths concurrently when making assembly decisions, while still working well for data with one insertion length

    High Performance Computing for DNA Sequence Alignment and Assembly

    Get PDF
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical
    corecore