2,772 research outputs found

    Streamlined analysis of duplex sequencing data with Du Novo

    Get PDF

    Challenges of Identifying Clinically Actionable Genetic Variants for Precision Medicine

    Get PDF

    Mitigating Anticipated Effects of Systematic Errors Supports Sister-Group Relationship between Xenacoelomorpha and Ambulacraria

    Get PDF
    Xenoturbella and the acoelomorph worms (Xenacoe-lomorpha) are simple marine animals with controversial affinities. They have been placed as the sister group of all other bilaterian animals (Nephrozoa hypothesis), implying their simplicity is an ancient characteristic [1, 2]; alternatively, they have been linked to the complex Ambulacraria (echinoderms and hemichordates) in a Glade called the Xenambulacraria [3,5], suggesting their simplicity evolved by reduction from a complex ancestor. The difficulty resolving this problem implies the phylogenetic signal supporting the correct solution is weak and affected by inadequate modeling, creating a misleading non-phylogenetic signal. The idea that the Nephrozoa hypothesis might be an artifact is prompted by the faster molecular evolutionary rate observed within the Acoelomorpha. Unequal rates of evolution are known to result in the systematic artifact of long branch attraction, which would be predicted to result in an attraction between long-branch acoelomorphs and the outgroup, pulling them toward the root [6]. Other biases inadequately accommodated by the models used can also have strong effects, exacerbated in the context of short internal branches and long terminal branches [7]. We have assembled a large and informative dataset to address this problem. Analyses designed to reduce or to emphasize misleading signals show the Nephrozoa hypothesis is supported under conditions expected to exacerbate errors, and the Xenambulacraria hypothesis is preferred in conditions designed to reduce errors. Our reanalyses of two other recently published datasets [1, 2] produce the same result. We conclude that the Xenacoelomorpha are simplified relatives of the Ambulacraria

    Method development for screening archaeological samples for ancient pathogens

    Get PDF
    Recent advances in sequencing technology made it possible to retrieve DNA from archaeological samples that are hundreds or thousand years old Working with DNA retrieved from archaeological samples poses its own unique challenges. Firstly, DNA has a half-life and will decay after death, that means, only a very small part of a samples DNA content is endogenous DNA. Further the small amount of endogenous DNA that is left tends to be very short and fragmented. In this thesis, we address the main challenges faced by paleomicrobiology. With HOPS, we created a novel screening tool tailored to the characteristics of ancient DNA. HOPS produced reliable results both with in silico as well as real data for as few as 50 reads per species, which are an adequate representation of most screening Samples. While HOPS is designed to be still sensitive when working with low endogenous degraded DNA, it is dependent on the quality of the metagenomic reference database that is the foundation of the analysis. A novel tool DatabaseZen downloads and generates metagenomic databases from NCBI RefSeq, while using several filters to ensure database quality by removing contaminated sequences and avoid database bias by ensuring a balanced database composition. The low-endogenous DNA content in archaeological samples has negative consequences for paleomicrobiology, as most of the sequencing effort will go towards environmental contaminants instead of the remnants of the microbiome preserved in a sample. To that and we use a novel 16S/18S rRNA in-solution capture as an alternative to whole genome shotgun sequencing. The 16S capture was highly efficient in enriching for 16S/18S rRNA in all analyzed samples. The capture has the advantage of removing the length bias, which is present in 16S amplicon sequencing. Overall, we provided a screening tool for paleomicrobiology, an application to generate and clean reference databases and an alternative sequencing technology

    Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome

    Get PDF
    Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between approximately 5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly

    Cyberinfrastructure resources enabling creation of the loblolly pine reference transcriptome

    Get PDF
    This paper was presented at XSEDE 15 conference.Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological discovery in both model and non- model organisms, with and without whole genome sequence references. A major limitation in effective building of transcriptome references is no longer the sequencing data generation itself, but the computing infrastructure and expertise needed to assemble, analyze and manage the data. Here we describe a currently available resource dedicated to achieving such goals, and its use for extensive RNA assembly of up to 1.3 billion reads representing the massive transcriptome of loblolly pine, using four major assembly software installations. The Mason cluster, an XSEDE second tier resource at Indiana University, provides the necessary fast CPU cycles, large memory, and high I/O throughput for conducting large-scale genomics research. The National Center for Genome Analysis Support, or NCGAS, provides technical support in using HPC systems, bioinformatic support for determining the appropriate method to analyze a given dataset, and practical assistance in running computations. We demonstrate that a sufficient supercomputing resource and good workflow design are elements that are essential to large eukaryotic genomics and transcriptomics projects such as the complex transcriptome of loblolly pine, gene expression data that inform annotation and functional interpretation of the largest genome sequence reference to date.This work was supported in part by USDA NIFA grant 2011- 67009-30030, PineRefSeq, led by the University of California, Davis, and NCGAS funded by NSF under award No. 1062432
    corecore