2,772 research outputs found
Recommended from our members
Ultraaccurate genome sequencing and haplotyping of single human cells.
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs
Recommended from our members
Software for Computing and Annotating Genomic Ranges
We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization
Mitigating Anticipated Effects of Systematic Errors Supports Sister-Group Relationship between Xenacoelomorpha and Ambulacraria
Xenoturbella and the acoelomorph worms (Xenacoe-lomorpha) are simple marine animals with controversial affinities. They have been placed as the sister group of all other bilaterian animals (Nephrozoa hypothesis), implying their simplicity is an ancient characteristic [1, 2]; alternatively, they have been linked to the complex Ambulacraria (echinoderms and hemichordates) in a Glade called the Xenambulacraria [3,5], suggesting their simplicity evolved by reduction from a complex ancestor. The difficulty resolving this problem implies the phylogenetic signal supporting the correct solution is weak and affected by inadequate modeling, creating a misleading non-phylogenetic signal. The idea that the Nephrozoa hypothesis might be an artifact is prompted by the faster molecular evolutionary rate observed within the Acoelomorpha. Unequal rates of evolution are known to result in the systematic artifact of long branch attraction, which would be predicted to result in an attraction between long-branch acoelomorphs and the outgroup, pulling them toward the root [6]. Other biases inadequately accommodated by the models used can also have strong effects, exacerbated in the context of short internal branches and long terminal branches [7]. We have assembled a large and informative dataset to address this problem. Analyses designed to reduce or to emphasize misleading signals show the Nephrozoa hypothesis is supported under conditions expected to exacerbate errors, and the Xenambulacraria hypothesis is preferred in conditions designed to reduce errors. Our reanalyses of two other recently published datasets [1, 2] produce the same result. We conclude that the Xenacoelomorpha are simplified relatives of the Ambulacraria
Method development for screening archaeological samples for ancient pathogens
Recent advances in sequencing technology made it possible to retrieve DNA from archaeological samples that are hundreds or thousand years old Working with DNA retrieved from archaeological samples poses its own unique challenges. Firstly, DNA has a half-life and will decay after death, that means, only a very small part of a samples DNA content is endogenous DNA. Further the small amount of endogenous DNA that is left tends to be very short and fragmented. In this thesis, we address the main challenges faced by paleomicrobiology. With HOPS, we created a novel screening tool tailored to the characteristics of ancient DNA. HOPS produced reliable results both with in silico as well as real data for as few as 50 reads per species, which are an adequate representation of most screening Samples. While HOPS is designed to be still sensitive when working with low endogenous degraded DNA, it is dependent on the quality of the metagenomic reference database that is the foundation of the analysis. A novel tool DatabaseZen downloads and generates metagenomic databases from NCBI RefSeq, while using several filters to ensure database quality by removing contaminated sequences and avoid database bias by ensuring a balanced database composition. The low-endogenous DNA content in archaeological samples has negative consequences for paleomicrobiology, as most of the sequencing effort will go towards environmental contaminants instead of the remnants of the microbiome preserved in a sample. To that and we use a novel 16S/18S rRNA in-solution capture as an alternative to whole genome shotgun sequencing. The 16S capture was highly efficient in enriching for 16S/18S rRNA in all analyzed samples. The capture has the advantage of removing the length bias, which is present in 16S amplicon sequencing. Overall, we provided a screening tool for paleomicrobiology, an application to generate and clean reference databases and an alternative sequencing technology
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome
Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between approximately 5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly
Cyberinfrastructure resources enabling creation of the loblolly pine reference transcriptome
This paper was presented at XSEDE 15 conference.Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological discovery in both model and non- model organisms, with and without whole genome sequence references. A major limitation in effective building of transcriptome references is no longer the sequencing data generation itself, but the computing infrastructure and expertise needed to assemble, analyze and manage the data. Here we describe a currently available resource dedicated to achieving such goals, and its use for extensive RNA assembly of up to 1.3 billion reads representing the massive transcriptome of loblolly pine, using four major assembly software installations. The Mason cluster, an XSEDE second tier resource at Indiana University, provides the necessary fast CPU cycles, large memory, and high I/O throughput for conducting large-scale genomics research. The National Center for Genome Analysis Support, or NCGAS, provides technical support in using HPC systems, bioinformatic support for determining the appropriate method to analyze a given dataset, and practical assistance in running computations. We demonstrate that a sufficient supercomputing resource and good workflow design are elements that are essential to large eukaryotic genomics and transcriptomics projects such as the complex transcriptome of loblolly pine, gene expression data that inform annotation and functional interpretation of the largest genome sequence reference to date.This work was supported in part by USDA NIFA grant 2011- 67009-30030, PineRefSeq, led by the University of California, Davis, and NCGAS funded by NSF under award No. 1062432
- …