49 research outputs found

    Quantitative evaluation of bias in barcode markers derived from complex samples

    Get PDF
    PCR products have become a major commodity used to identify organisms based on polymorphism at the DNA level. One problem arising is that unbiased identification of organisms takes as working hypothesis that when DNA is extracted from a sample, a positive signal will be obtained if universal primers are used and DNA quality is suitable for PCR. As this assumption is not always correct we used a system where large differences in PCR success have been described to identify where biases appear and maybe identify solutions. Plants can be identified with at least seven independent plastid‐located loci. These differ in their degree of PCR success and how informative they are in terms of taxonomically useful sequence polymorphisms. Here we used six common plastid loci spanning 48 plant species and performed a quantitative analysis of bias at each step of the identification process. As expected we found important differences in PCR efficiency within a single species, depending on the barcoding sequence being amplified. Quantitative PCR revealed that the Ct threshold for various plastid loci, even within a single species, could exhibit greater than 2000‐fold differences in DNA quantity after amplification. We then performed Next Generation Sequencing experiments in nine species using equal quantities of three plastid‐based primers and equally‐mixed quantities of DNA from multiple species. The result was significantly biased towards species and specific loci even when using adaptor‐specific primers. Our results caution that Next‐Generation Sequencing projects may suffer dramatic bias, arising largely during DNA amplification steps. Moreover, that amplification‐based Next Generation Sequencing technologies exhibit additional bias despite using adaptor‐specific primers, indicating that amplification success depends on the DNA fragment. As such, while qualitative analysis of unknown samples are prone to false negative results if a combination of widely‐successful amplicons are not used, quantitative results should be considered highly suspect, even if all species in the starting sample are known.This work was funded by the Comunidad Autónoma de la Región de Murcia Project “Molecular markers in coservation and management of the flora of Murcia Region”

    Wheat EST resources for functional genomics of abiotic stress

    Get PDF
    BACKGROUND: Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. RESULTS: We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. CONCLUSION: We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals

    De novo sequence assembly of Albugo candida reveals a small genome relative to other biotrophic oomycetes

    Get PDF
    Background: Albugo candida is a biotrophic oomycete that parasitizes various species of Brassicaceae, causing a disease (white blister rust) with remarkable convergence in behaviour to unrelated rusts of basidiomycete fungi. Results: A recent genome analysis of the oomycete Hyaloperonospora arabidopsidis suggests that a reduction in the number of genes encoding secreted pathogenicity proteins, enzymes for assimilation of inorganic nitrogen and sulphur represent a genomic signature for the evolution of obligate biotrophy. Here, we report a draft reference genome of a major crop pathogen Albugo candida (another obligate biotrophic oomycete) with an estimated genome of 45.3 Mb. This is very similar to the genome size of a necrotrophic oomycete Pythium ultimum (43 Mb) but less than half that of H. arabidopsidis (99 Mb). Sequencing of A. candida transcripts from infected host tissue and zoosporangia combined with genome-wide annotation revealed 15,824 predicted genes. Most of the predicted genes lack significant similarity with sequences from other oomycetes. Most intriguingly, A. candida appears to have a much smaller repertoire of pathogenicity-related proteins than H. arabidopsidis including genes that encode RXLR effector proteins, CRINKLER-like genes, and elicitins. Necrosis and Ethylene inducing Peptides were not detected in the genome of A. candida. Putative orthologs of tat-C, a component of the twin arginine translocase system, were identified from multiple oomycete genera along with proteins containing putative tatsecretion signal peptides. Conclusion: Albugo candida has a comparatively small genome amongst oomycetes, retains motility of sporangial inoculum, and harbours a much smaller repertoire of candidate effectors than was recently reported for H. arabidopsidis. This minimal gene repertoire could indicate a lack of expansion, rather than a reduction, in the number of genes that signify the evolution of biotrophy in oomycetes

    Genome-Wide Analysis of Ethylene-Responsive Element Binding Factor-Associated Amphiphilic Repression Motif-Containing Transcriptional Regulators in Arabidopsis1[W][OA]

    No full text
    The ethylene-responsive element binding factor-associated amphiphilic repression (EAR) motif is a transcriptional regulatory motif identified in members of the ethylene-responsive element binding factor, C2H2, and auxin/indole-3-acetic acid families of transcriptional regulators. Sequence comparison of the core EAR motif sites from these proteins revealed two distinct conservation patterns: LxLxL and DLNxxP. Proteins containing these motifs play key roles in diverse biological functions by negatively regulating genes involved in developmental, hormonal, and stress signaling pathways. Through a genome-wide bioinformatics analysis, we have identified the complete repertoire of the EAR repressome in Arabidopsis (Arabidopsis thaliana) comprising 219 proteins belonging to 21 different transcriptional regulator families. Approximately 72% of these proteins contain a LxLxL type of EAR motif, 22% contain a DLNxxP type of EAR motif, and the remaining 6% have a motif where LxLxL and DLNxxP are overlapping. Published in vitro and in planta investigations support approximately 40% of these proteins functioning as negative regulators of gene expression. Comparative sequence analysis of EAR motif sites and adjoining regions has identified additional preferred residues and potential posttranslational modification sites that may influence the functionality of the EAR motif. Homology searches against protein databases of poplar (Populus trichocarpa), grapevine (Vitis vinifera), rice (Oryza sativa), and sorghum (Sorghum bicolor) revealed that the EAR motif is conserved across these diverse plant species. This genome-wide analysis represents the most extensive survey of EAR motif-containing proteins in Arabidopsis to date and provides a resource enabling investigations into their biological roles and the mechanism of EAR motif-mediated transcriptional regulation

    The chaperonin-60 universal target (cpn60 UT) is a barcode for Bacteria that enables de novo identification of operational taxononmic units of metagenomic sequence data

    Get PDF
    Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances (\u201cbarcode gap\u201d) was found from cpn60. Distribution of sequence diversity along the ~555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.Peer reviewed: YesNRC publication: Ye

    Unraveling the Rhizosphere using the cpn 60 genomic marker and pyrosequencing

    No full text
    The microbial communities of two distinct soils and of the rhizosphere, tuber-associated soil, and washed roots of potato plants grown in each soil were profiled by cpn60 gene-targeted metagenomics. DNA samples extracted from these sources were used as templates for PCR amplification of the cpn60 universal target regions present in each metagenomic sample. The cpn60 amplicons were analyzed by pyrosequencing. The 914,932 sequence reads obtained were aligned and assembled into unique cpn60 nucleotide sequences in an autonomous process that did not refer to a database of known cpn60 sequences. This process identified 27,222 unique nucleotide sequences, corresponding to 21,396 unique peptide sequences. The closest matches for each of these sequences in a database of cpn60 sequences, cpnDB, were determined. Bulk soil microbial richness [i.e., total number of unique operational taxonomic units (OTU)] was much greater than that of the plant-associated samples, as expected. The richness of the microbial communities associated with the plant samples ranged from 13% to 44% of that of the bulk soil in which it was grown. When only distinct peptide sequences derived from the nucleotide sequence OTU) were included, the apparent richness was reduced for all samples. If only OTU with higher relative abundances in the plant-associated sample than those in the bulk soil were considered, the apparent richness of the plant-associated microbial communities was significantly reduced. Clustering analysis identified OTU with distributions among the samples that strongly suggested a functional relationship with the plant. Classification of the reads observed in each sample the taxonomic level of bacterial Order revealed major differences between bulk soil and plant-associated communities. The distributions of a small number of OTU between samples suggest that these organisms had privileged relationships with the plant.Peer reviewed: YesNRC publication: Ye

    Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea

    Get PDF
    Background: Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. Results: We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. Conclusions: Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes

    CaptureSeq: Hybridization-Based Enrichment of cpn60 Gene Fragments Reveals the Community Structures of Synthetic and Natural Microbial Ecosystems

    No full text
    Background. The molecular profiling of complex microbial communities has become the basis for examining the relationship between the microbiome composition, structure and metabolic functions of those communities. Microbial community structure can be partially assessed with “universal” PCR targeting taxonomic or functional gene markers. Increasingly, shotgun metagenomic DNA sequencing is providing more quantitative insight into microbiomes. However, both amplicon-based and shotgun sequencing approaches have shortcomings that limit the ability to study microbiome dynamics. Methods. We present a novel, amplicon-free, hybridization-based method (CaptureSeq) for profiling complex microbial communities using probes based on the chaperonin-60 gene. Molecular profiles of a commercially available synthetic microbial community standard were compared using CaptureSeq, whole metagenome sequencing, and 16S universal target amplification. Profiles were also generated for natural ecosystems including antibiotic-amended soils, manure storage tanks, and an agricultural reservoir. Results. The CaptureSeq method generated a microbial profile that encompassed all of the bacteria and eukaryotes in the panel with greater reproducibility and more accurate representation of high G/C content microorganisms compared to 16S amplification. In the natural ecosystems, CaptureSeq provided a much greater depth of coverage and sensitivity of detection compared to shotgun sequencing without prior selection. The resulting community profiles provided quantitatively reliable information about all three domains of life (Bacteria, Archaea, and Eukarya) in the different ecosystems. The applications of CaptureSeq will facilitate accurate studies of host-microbiome interactions for environmental, crop, animal and human health. Conclusions: cpn60-based hybridization enriched for taxonomically informative DNA sequences from complex mixtures. In synthetic and natural microbial ecosystems, CaptureSeq provided sequences from prokaryotes and eukaryotes simultaneously, with quantitatively reliable read abundances. CaptureSeq provides an alternative to PCR amplification of taxonomic markers with deep community coverage while minimizing amplification biases

    Error trade-offs in OTU assembly optimization.

    No full text
    <p>A. Total error (left ordinate) for <i>de novo</i> assemblies of <i>cpn</i>60 UT sequence reads from a synthetic community of 20 cloned targets, using a minimum identity value of 92% and a range of minimum overlap lengths (50–400 nucleotides). Raw total error (blue line), as well as error remaining after post-assembly primer trimming and clustering (red line), and after chimera removal (green line). Light blue bars indicate the percent of sequence reads identified as singletons in each assembly (right ordinate). B. Number of OTU assembled at each minimum overlap length. Each coloured segment of the stacked bar indicates a different member of the panel of 20 community members. The total number of OTU assembled is indicated on the top of each stack.</p
    corecore