96 research outputs found

    A DE NOVO ASSEMBLY METHOD FOR SHORT SEQUENCE OF SOLID-SAGE READS RESPONSIBLE FOR WHEAT (TRITICUM AESTIVUM L.) LEAF RUST

    Get PDF
    Objective: Wheat leaf rust is one of the most widespread rust diseases caused by Puccinia triticina Eriks. De novo assembly of short sequence reads in order to understand the molecular phenomenon underlying wheat leaf rust interaction and to assemble differentially expressed genes, resistance genes and the genes encoding transcription factors in response to Puccinia infection in wheat was the main objective of the present study.Methods: De novo assembly of SOLiD (sequencing by oligonucleotide ligation and detection) SAGE (serial analysis of gene expression) sequence reads from a pair of Near-isogenic lines (NILs) of wheat cultivar HD2329 with Lr28 (resistant) and HD2329 lacking Lr28 (susceptible) that were either infected with the most virulent pathogen Puccinia triticina or inoculated as mock in the absence of any reference sequence was carried out using multiple k-mer approach. Combinations of different software working on different algorithm were used to obtain a maximum number of differentially expressed transcripts.Results: De novo assembly at different k-mers produced a large number of contigs. The size of contigs was further increased with the use of different assembly software. Redundancy was removed both at nucleotide and protein levels, which increased the quality of assembly.Conclusion: For the assembly of short sequences of the complex genome such as those of polyploids a combination of software gives longer and unique contigs. It may be used in understanding the molecular mechanism of plant-microbe interaction.Keywords: Wheat, Leaf rust, SOLiD, SAGE, De novo assembly, NILs

    High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    Get PDF
    BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power

    Comparative Phylogenomics of Pathogenic and Nonpathogenic Species.

    Get PDF
    The Ascomycete Onygenales order embraces a diverse group of mammalian pathogens, including the yeast-forming dimorphic fungal pathogens Histoplasma capsulatum, Paracoccidioides spp. and Blastomyces dermatitidis, the dermatophytes Microsporum spp. and Trichopyton spp., the spherule-forming dimorphic fungal pathogens in the genus Coccidioides, and many nonpathogens. Although genomes for all of the aforementioned pathogenic species are available, only one nonpathogen had been sequenced. Here, we enhance comparative phylogenomics in Onygenales by adding genomes for Amauroascus mutatus, Amauroascus niger, Byssoonygena ceratinophila, and Chrysosporium queenslandicum--four nonpathogenic Onygenales species, all of which are more closely related to Coccidioides spp. than any other known Onygenales species. Phylogenomic detection of gene family expansion and contraction can provide clues to fungal function but is sensitive to taxon sampling. By adding additional nonpathogens, we show that LysM domain-containing proteins, previously thought to be expanding in some Onygenales, are contracting in the Coccidioides-Uncinocarpus clade, as are the self-nonself recognition Het loci. The denser genome sampling presented here highlights nearly 800 genes unique to Coccidiodes, which have significantly fewer known protein domains and show increased expression in the endosporulating spherule, the parasitic phase unique to Coccidioides spp. These genomes provide insight to gene family expansion/contraction and patterns of individual gene gain/loss in this diverse order--both major drivers of evolutionary change. Our results suggest that gene family expansion/contraction can lead to adaptive radiations that create taxonomic orders, while individual gene gain/loss likely plays a more significant role in branch-specific phenotypic changes that lead to adaptation for species or genera

    Diminishing Return for Increased Mappability with Longer Sequencing Reads: Implications of the k-mer Distributions in the Human Genome

    Get PDF
    The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a greater length increases the chance for reads being uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the k-mer distribution of the human reference genome. The k-mer frequency is determined for k ranging from 20 to 1000 basepairs. We use the proportion of non-singleton k-mers to evaluate the mappability of reads for a corresponding read length. We observe that the proportion of non-singletons decreases slowly with increasing k, and can be fitted by piecewise power-law functions with different exponents at different k ranges. A faster decay at smaller values for k indicates more limited gains for read lengths > 200 basepairs. The frequency distributions of k-mers exhibit long tails in a power-law-like trend, and rank frequency plots exhibit a concave Zipf's curve. The location of the most frequent 1000-mers comprises 172 kilobase-ranged regions, including four large stretches on chromosomes 1 and X, containing genes with biomedical implications. Even the read length 1000 would be insufficient to reliably sequence these specific regions.Comment: 5 figure

    Genetic characterization of the rare Bruconha virus (Bunyavirales: Orthobunyavirus) isolated in Vale do Ribeira (Atlantic Forest biome), Southeastern Brazil

    Get PDF
    Brazil is a great source of arbovirus diversity, mainly in the Amazon region. However, other biomes, especially the Atlantic Forest, may also be a hotspot for emerging viruses, including Bunyaviruses (Negarnaviricota: Bunyavirales). For instance, Vale do Ribeira, located in the Southeastern region, has been widely studied for virus surveillance, where Flavivirus, Alphavirus and Bunyaviruses were isolated during the last decades, including Bruconha virus (BRCV), a member of Orthobunyavirus genus Group C, in 1976. Recently, a new isolate of BRCV named Span321532 was obtained from an adult sentinel mouse placed in Iguape city in 2011, and a full-length genome was generated with nucleotide differences ranging between 1.5%, 5.3% and 5% (L, M and S segments, respectively) from the prototype isolated 35 years earlier. In addition, each segment placed BRCV into different clusters, showing the high variety within Bunyavirales. Although no evidence for reassortants was detected, this finding reiterates the need for new surveillance and genomic studies in the area considering the high mutation rates of arbovirus, and also to identify the hosts capable of supporting the continuous circulation of Orthobunyavirus

    Fuzzy-based Spectral Alignment for Correcting DNA Sequence from Next Generation Sequencer

    Get PDF
    Next generation sequencing technology is able to generate short read in large numbers and in a relatively short in single running programs. Graph based DNA sequence assembly used to handle these big data in assembly step. The graph based DNA sequence assembly is very sensitive to DNA sequencing error. This problem could be solved by performing an error correction step before the assembly process. This research proposed fuzzy inference system (FIS) model based spectral alignment method which can detect and correct DNA sequencing error. The spectral alignment technique was implemented as a pre-processing step before the DNA sequence assembly process. The evaluation was conducted using Velvet assembler. The number of nodes yielded by the Velvet assembler become a measure of the success of error correction. The results shows that FIS model based spectral alignment created small number of nodes and therefore it successfully corrected the DNA reads

    Computational and Systems Biology Advances to Enable Bioagent-Agnostic Signatures

    Full text link
    Enumerated threat agent lists have long driven biodefense priorities. The global SARS-CoV-2 pandemic demonstrated the limitations of searching for known threat agents as compared to a more agnostic approach. Recent technological advances are enabling agent-agnostic biodefense, especially through the integration of multi-modal observations of host-pathogen interactions directed by a human immunological model. Although well-developed technical assays exist for many aspects of human-pathogen interaction, the analytic methods and pipelines to combine and holistically interpret the results of such assays are immature and require further investments to exploit new technologies. In this manuscript, we discuss potential immunologically based bioagent-agnostic approaches and the computational tool gaps the community should prioritize filling

    A new strategy for better genome assembly from very short reads

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the rapid development of the next generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. Because of repetitive regions of genomes and some other factors, assembly of very short reads is still a challenging issue.</p> <p>Results</p> <p>A novel strategy for improving genome assembly from very short reads is proposed. It can increase accuracies of assemblies by integrating <it>de novo </it>contigs, and produce comparative contigs by allowing multiple references without limiting to genomes of closely related strains. Comparative contigs are used to scaffold <it>de novo </it>contigs. Using simulated and real datasets, it is shown that our strategy can effectively improve qualities of assemblies of isolated microbial genomes and metagenomes.</p> <p>Conclusions</p> <p>With more and more reference genomes available, our strategy will be useful to improve qualities of genome assemblies from very short reads. Some scripts are provided to make our strategy applicable at <url>http://code.google.com/p/cd-hybrid/</url>.</p

    FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads

    Get PDF
    The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smithā€“Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets
    • ā€¦
    corecore