185 research outputs found

    QSRA – a quality-value guided de novo short read assembler

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data.</p> <p>Results</p> <p>We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality.</p> <p>Conclusion</p> <p>QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.</p

    De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae

    Get PDF
    We developed a novel approach for de novo genome assembly using only sequence data from high-throughput short read sequencing technologies. By combining data generated from 454 Life Sciences (Roche) and Illumina (formerly known as Solexa sequencing) sequencing platforms, we reliably assembled genomes into large scaffolds at a fraction of the traditional cost and without use of a reference sequence. We applied this method to two isolates of the phytopathogenic bacteria Pseudomonas syringae. Sequencing and reassembly of the well-studied tomato and Arabidopsis pathogen, PtoDC3000, facilitated development and testing of our method. Sequencing of a distantly related rice pathogen, Por1_6, demonstrated our method's efficacy for de novo assembly of novel genomes. Our assembly of Por1_6 yielded an N50 scaffold size of 531,821 bp with >75% of the predicted genome covered by scaffolds over 100,000 bp. One of the critical phenotypic differences between strains of P. syringae is the range of plant hosts they infect. This is largely determined by their complement of type III effector proteins. The genome of Por1_6 is the first sequenced for a P. syringae isolate that is a pathogen of monocots, and, as might be predicted, its complement of type III effectors differs substantially from the previously sequenced isolates of this species. The genome of Por1_6 helps to define an expansion of the P. syringae pan-genome, a corresponding contraction of the core genome, and a further diversification of the type III effector complement for this important plant pathogen species

    Circular RNAs are abundant, conserved, and associated with ALU repeats

    Get PDF
    Circular RNAs composed of exonic sequence have been described in a small number of genes. Thought to result from splicing errors, circular RNA species possess no known function. To delineate the universe of endogenous circular RNAs, we performed high-throughput sequencing (RNA-seq) of libraries prepared from ribosome-depleted RNA with or without digestion with the RNA exonuclease, RNase R. We identified >25,000 distinct RNA species in human fibroblasts that contained non-colinear exons (a “backsplice”) and were reproducibly enriched by exonuclease degradation of linear RNA. These RNAs were validated as circular RNA (ecircRNA), rather than linear RNA, and were more stable than associated linear mRNAs in vivo. In some cases, the abundance of circular molecules exceeded that of associated linear mRNA by >10-fold. By conservative estimate, we identified ecircRNAs from 14.4% of actively transcribed genes in human fibroblasts. Application of this method to murine testis RNA identified 69 ecircRNAs in precisely orthologous locations to human circular RNAs. Of note, paralogous kinases HIPK2 and HIPK3 produce abundant ecircRNA from their second exon in both humans and mice. Though HIPK3 circular RNAs contain an AUG translation start, it and other ecircRNAs were not bound to ribosomes. Circular RNAs could be degraded by siRNAs and, therefore, may act as competing endogenous RNAs. Bioinformatic analysis revealed shared features of circularized exons, including long bordering introns that contained complementary ALU repeats. These data show that ecircRNAs are abundant, stable, conserved and nonrandom products of RNA splicing that could be involved in control of gene expression

    Analysis of quality raw data of second generation sequencers with Quality Assessment Software

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.</p> <p>Findings</p> <p>We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.</p> <p>Conclusions</p> <p>Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.</p

    Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

    Get PDF
    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness

    Bartter- and Gitelman-like syndromes: salt-losing tubulopathies with loop or DCT defects

    Get PDF
    Salt-losing tubulopathies with secondary hyperaldosteronism (SLT) comprise a set of well-defined inherited tubular disorders. Two segments along the distal nephron are primarily involved in the pathogenesis of SLTs: the thick ascending limb of Henle’s loop, and the distal convoluted tubule (DCT). The functions of these pre- and postmacula densa segments are quite distinct, and this has a major impact on the clinical presentation of loop and DCT disorders – the Bartter- and Gitelman-like syndromes. Defects in the water-impermeable thick ascending limb, with its greater salt reabsorption capacity, lead to major salt and water losses similar to the effect of loop diuretics. In contrast, defects in the DCT, with its minor capacity of salt reabsorption and its crucial role in fine-tuning of urinary calcium and magnesium excretion, provoke more chronic solute imbalances similar to the effects of chronic treatment with thiazides. The most severe disorder is a combination of a loop and DCT disorder similar to the enhanced diuretic effect of a co-medication of loop diuretics with thiazides. Besides salt and water supplementation, prostaglandin E2-synthase inhibition is the most effective therapeutic option in polyuric loop disorders (e.g., pure furosemide and mixed furosemide–amiloride type), especially in preterm infants with severe volume depletion. In DCT disorders (e.g., pure thiazide and mixed thiazide–furosemide type), renin–angiotensin–aldosterone system (RAAS) blockers might be indicated after salt, potassium, and magnesium supplementation are deemed insufficient. It appears that in most patients with SLT, a combination of solute supplementation with some drug treatment (e.g., indomethacin) is needed for a lifetime

    Mutations in isocitrate dehydrogenase 1 and 2 occur frequently in intrahepatic cholangiocarcinomas and share hypermethylation targets with glioblastomas

    Get PDF
    Mutations in the genes encoding isocitrate dehydrogenase, IDH1 and IDH2, have been reported in gliomas, myeloid leukemias, chondrosarcomas, and thyroid cancer. We discovered IDH1 and IDH2 mutations in 34 of 326 (10%) intrahepatic cholangiocarcinomas. Tumor with mutations in IDH1 or IDH2 had lower 5-hydroxymethylcytosine (5hmC) and higher 5-methylcytosine (5mC) levels, as well as increased dimethylation of histone H3K79. Mutations in IDH1 or IDH2 were associated with longer overall survival (p = 0.028) and were independently associated with a longer time to tumor recurrence after intrahepatic cholangiocarcinoma resection in multivariate analysis (p = 0.021). IDH1 and IDH2 mutations are significantly associated with increased levels of p53 in intrahepatic cholangiocarcinomas, but no mutations in the p53 gene were found, suggesting that mutations in IDH1 and IDH2 may cause a stress that leads to p53 activation. We identified 2,309 genes that were significantly hypermethylated in 19 cholangiocarcinomas with mutations in IDH1 or IDH2, compared with cholangiocarcinomas without these mutations. Hypermethylated CpG sites were significantly enriched in CpG shores and upstream of transcription start sites, suggesting a global regulation of transcriptional potential. Half of the hypermethylated genes overlapped with DNA hypermethylation in IDH1-mutant gliobastomas, suggesting the existence of a common set of genes whose expression may be affected by mutations in IDH1 or IDH2 in different types of tumors

    A Cytoplasmic Domain Mutation in ClC-Kb Affects Long-Distance Communication Across the Membrane

    Get PDF
    BACKGROUND: ClC-Kb and ClC-Ka are homologous chloride channels that facilitate chloride homeostasis in the kidney and inner ear. Disruption of ClC-Kb leads to Bartter's Syndrome, a kidney disease. A point mutation in ClC-Kb, R538P, linked to Bartter's Syndrome and located in the C-terminal cytoplasmic domain was hypothesized to alter electrophysiological properties due to its proximity to an important membrane-embedded helix. METHODOLOGY/PRINCIPAL FINDINGS: Two-electrode voltage clamp experiments were used to examine the electrophysiological properties of the mutation R538P in both ClC-Kb and ClC-Ka. R538P selectively abolishes extracellular calcium activation of ClC-Kb but not ClC-Ka. In attempting to determine the reason for this specificity, we hypothesized that the ClC-Kb C-terminal domain had either a different oligomeric status or dimerization interface than that of ClC-Ka, for which a crystal structure has been published. We purified a recombinant protein corresponding to the ClC-Kb C-terminal domain and used multi-angle light scattering together with a cysteine-crosslinking approach to show that the dimerization interface is conserved between the ClC-Kb and ClC-Ka C-terminal domains, despite the fact that there are several differences in the amino acids that occur at this interface. CONCLUSIONS: The R538P mutation in ClC-Kb, which leads to Bartter's Syndrome, abolishes calcium activation of the channel. This suggests that a significant conformational change--ranging from the cytoplasmic side of the protein to the extracellular side of the protein--is involved in the Ca(2+)-activation process for ClC-Kb, and shows that the cytoplasmic domain is important for the channel's electrophysiological properties. In the highly similar ClC-Ka (90% identical), the R538P mutation does not affect activation by extracellular Ca(2+). This selective outcome indicates that ClC-Ka and ClC-Kb differ in how conformational changes are translated to the extracellular domain, despite the fact that the cytoplasmic domains share the same quaternary structure
    corecore