368 research outputs found

    QSRA – a quality-value guided de novo short read assembler

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data.</p> <p>Results</p> <p>We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality.</p> <p>Conclusion</p> <p>QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.</p

    De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae

    Get PDF
    We developed a novel approach for de novo genome assembly using only sequence data from high-throughput short read sequencing technologies. By combining data generated from 454 Life Sciences (Roche) and Illumina (formerly known as Solexa sequencing) sequencing platforms, we reliably assembled genomes into large scaffolds at a fraction of the traditional cost and without use of a reference sequence. We applied this method to two isolates of the phytopathogenic bacteria Pseudomonas syringae. Sequencing and reassembly of the well-studied tomato and Arabidopsis pathogen, PtoDC3000, facilitated development and testing of our method. Sequencing of a distantly related rice pathogen, Por1_6, demonstrated our method's efficacy for de novo assembly of novel genomes. Our assembly of Por1_6 yielded an N50 scaffold size of 531,821 bp with >75% of the predicted genome covered by scaffolds over 100,000 bp. One of the critical phenotypic differences between strains of P. syringae is the range of plant hosts they infect. This is largely determined by their complement of type III effector proteins. The genome of Por1_6 is the first sequenced for a P. syringae isolate that is a pathogen of monocots, and, as might be predicted, its complement of type III effectors differs substantially from the previously sequenced isolates of this species. The genome of Por1_6 helps to define an expansion of the P. syringae pan-genome, a corresponding contraction of the core genome, and a further diversification of the type III effector complement for this important plant pathogen species

    Circular RNAs are abundant, conserved, and associated with ALU repeats

    Get PDF
    Circular RNAs composed of exonic sequence have been described in a small number of genes. Thought to result from splicing errors, circular RNA species possess no known function. To delineate the universe of endogenous circular RNAs, we performed high-throughput sequencing (RNA-seq) of libraries prepared from ribosome-depleted RNA with or without digestion with the RNA exonuclease, RNase R. We identified >25,000 distinct RNA species in human fibroblasts that contained non-colinear exons (a “backsplice”) and were reproducibly enriched by exonuclease degradation of linear RNA. These RNAs were validated as circular RNA (ecircRNA), rather than linear RNA, and were more stable than associated linear mRNAs in vivo. In some cases, the abundance of circular molecules exceeded that of associated linear mRNA by >10-fold. By conservative estimate, we identified ecircRNAs from 14.4% of actively transcribed genes in human fibroblasts. Application of this method to murine testis RNA identified 69 ecircRNAs in precisely orthologous locations to human circular RNAs. Of note, paralogous kinases HIPK2 and HIPK3 produce abundant ecircRNA from their second exon in both humans and mice. Though HIPK3 circular RNAs contain an AUG translation start, it and other ecircRNAs were not bound to ribosomes. Circular RNAs could be degraded by siRNAs and, therefore, may act as competing endogenous RNAs. Bioinformatic analysis revealed shared features of circularized exons, including long bordering introns that contained complementary ALU repeats. These data show that ecircRNAs are abundant, stable, conserved and nonrandom products of RNA splicing that could be involved in control of gene expression

    Assembly complexity of prokaryotic genomes using short reads

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.</p> <p>Results</p> <p>We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for <it>de novo </it>reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).</p> <p>Conclusions</p> <p>Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.</p

    Bartter- and Gitelman-like syndromes: salt-losing tubulopathies with loop or DCT defects

    Get PDF
    Salt-losing tubulopathies with secondary hyperaldosteronism (SLT) comprise a set of well-defined inherited tubular disorders. Two segments along the distal nephron are primarily involved in the pathogenesis of SLTs: the thick ascending limb of Henle’s loop, and the distal convoluted tubule (DCT). The functions of these pre- and postmacula densa segments are quite distinct, and this has a major impact on the clinical presentation of loop and DCT disorders – the Bartter- and Gitelman-like syndromes. Defects in the water-impermeable thick ascending limb, with its greater salt reabsorption capacity, lead to major salt and water losses similar to the effect of loop diuretics. In contrast, defects in the DCT, with its minor capacity of salt reabsorption and its crucial role in fine-tuning of urinary calcium and magnesium excretion, provoke more chronic solute imbalances similar to the effects of chronic treatment with thiazides. The most severe disorder is a combination of a loop and DCT disorder similar to the enhanced diuretic effect of a co-medication of loop diuretics with thiazides. Besides salt and water supplementation, prostaglandin E2-synthase inhibition is the most effective therapeutic option in polyuric loop disorders (e.g., pure furosemide and mixed furosemide–amiloride type), especially in preterm infants with severe volume depletion. In DCT disorders (e.g., pure thiazide and mixed thiazide–furosemide type), renin–angiotensin–aldosterone system (RAAS) blockers might be indicated after salt, potassium, and magnesium supplementation are deemed insufficient. It appears that in most patients with SLT, a combination of solute supplementation with some drug treatment (e.g., indomethacin) is needed for a lifetime

    BET Protein Inhibition Regulates Macrophage Chromatin Accessibility and Microbiota-Dependent Colitis

    Get PDF
    Introduction In colitis, macrophage functionality is altered compared to normal homeostatic conditions. Loss of IL-10 signaling results in an inappropriate chronic inflammatory response to bacterial stimulation. It remains unknown if inhibition of bromodomain and extra-terminal domain (BET) proteins alters usage of DNA regulatory elements responsible for driving inflammatory gene expression. We determined if the BET inhibitor, (+)-JQ1, could suppress inflammatory activation of macrophages in Il10-/- mice. Methods We performed ATAC-seq and RNA-seq on Il10-/- bone marrow-derived macrophages (BMDMs) cultured in the presence and absence of lipopolysaccharide (LPS) with and without treatment with (+)-JQ1 and evaluated changes in chromatin accessibility and gene expression. Germ-free Il10-/- mice were treated with (+)-JQ1, colonized with fecal slurries and underwent histological and molecular evaluation 14-days post colonization. Results Treatment with (+)-JQ1 suppressed LPS-induced changes in chromatin at distal regulatory elements associated with inflammatory genes, particularly in regions that contain motifs for AP-1 and IRF transcription factors. This resulted in attenuation of inflammatory gene expression. Treatment with (+)-JQ1 in vivo resulted in a mild reduction in colitis severity as compared with vehicle-treated mice. Conclusion We identified the mechanism of action associated with a new class of compounds that may mitigate aberrant macrophage responses to bacteria in colitis

    Comparing De Novo Genome Assembly: The Long and Short of It

    Get PDF
    Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

    Targeted next generation sequencing identifies clinically actionable mutations in patients with melanoma

    Get PDF
    Somatic sequencing of cancers has produced new insight into tumorigenesis, tumor heterogeneity, and disease progression, but the vast majority of genetic events identified are of indeterminate clinical significance. Here we describe a NextGen sequencing approach to fully analyze 248 genes, including all those of known clinical significance in melanoma. This strategy features solution capture of DNA followed by multiplexed, high-throughput sequencing, and was evaluated in 31 melanoma cell lines and 18 tumor tissues from patients with metastatic melanoma. Mutations in melanoma cell lines correlated with their sensitivity to corresponding small molecule inhibitors, confirming, for example, lapatinib sensitivity in ERBB4 mutant lines and identifying a novel activating mutation of BRAF. The latter event would not have been identified by clinical sequencing and was associated with responsiveness to a BRAF kinase inhibitor. This approach identified focal copy number changes of PTEN not found by standard methods, such as comparative genomic hybridization (CGH). Actionable mutations were found in 89% of the tumor tissues analyzed, 56% of which would not be identified by standard-of-care approaches. This work shows that targeted sequencing is an attractive approach for clinical use in melanoma
    corecore