16 research outputs found

    Fragmentation of Contaminant and Endogenous DNA in Ancient Samples Determined by Shotgun Sequencing; Prospects for Human Palaeogenomics

    Get PDF
    Despite the successful retrieval of genomes from past remains, the prospects for human palaeogenomics remain unclear because of the difficulty of distinguishing contaminant from endogenous DNA sequences. Previous sequence data generated on high-throughput sequencing platforms indicate that fragmentation of ancient DNA sequences is a characteristic trait primarily arising due to depurination processes that create abasic sites leading to DNA breaks

    Consistency of metagenomic assignment programs in simulated and real data

    Get PDF
    Background: Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. Results: Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST+LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. Conclusions: The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.This work was financed by the MICINN (Spanish Ministry of Science and Innovation) grant SAF2010-16240. MGG was supported by a predoctoral fellowship from MICIN

    Consistency of metagenomic assignment programs in simulated and real data

    No full text
    Background: Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. Results: Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST+LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. Conclusions: The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.This work was financed by the MICINN (Spanish Ministry of Science and Innovation) grant SAF2010-16240. MGG was supported by a predoctoral fellowship from MICIN

    A new method for extracting skin microbes allows metagenomic analysis of whole-deep skin

    No full text
    In the last decade, an extensive effort has been made to characterize the human microbiota, due to its clinical and economic interests. However, a metagenomic approach to the skin microbiota is hampered by the high proportion of host DNA that is recovered. In contrast with the burgeoning field of gut metagenomics, skin metagenomics has been hindered by the absence of an efficient method to avoid sequencing the host DNA. We present here a method for recovering microbial DNA from skin samples, based on a combination of molecular techniques. We have applied this method to mouse skin, and have validated it by standard, quantitative PCR and amplicon sequencing of 16S rRNA. The taxonomic diversity recovered was not altered by this new method, as proved by comparing the phylogenetic structure revealed by 16S rRNA sequencing in untreated vs. treated samples. As proof of concept, we also present the first two mouse skin metagenomes, which allowed discovering new taxa (not only prokaryotes but also viruses and eukaryots) not reachable by 16S rRNA sequencing, as well as to characterize the skin microbiome functional landscape. Our method paves the way for the development of skin metagenomics, which will allow a much deeper knowledge of the skin microbiome and its relationship with the host, both in a healthy state and in relation to disease.This work was funded by the Spanish Ministry of Science and Innovation (MICINN)[grant numbers SAF2010-16240 and BFU2009-12895-CO2-01]. MGG was supported by a predoctoral fellowship from the Spanish Ministry of Scienceand Innovation [Grant number BES-2008-006029]

    Direct squencing from the minimal number of DNA molecules needed to fill a 454 picotiterplate

    No full text
    The large amount of DNA needed to prepare a library in next generation sequencing protocols hinders direct sequencing of small DNA samples. This limitation is usually overcome by the enrichment of such samples with whole genome amplification (WGA), mostly by multiple displacement amplification (MDA) based on φ29 polymerase. However, this technique can be biased by the GC content of the sample and is prone to the development of chimeras as well as contamination during enrichment, which contributes to undesired noise during sequence data analysis, and also hampers the proper functional and/or taxonomic assignments. An alternative to MDA is direct DNA sequencing (DS), which represents the theoretical gold standard in genome sequencing. In this work, we explore the possibility of sequencing the genome of Escherichia coli from the minimum number of DNA molecules required for pyrosequencing, according to the notion of one-bead-one-molecule. Using an optimized protocol for DS, we constructed a shotgun library containing the minimum number of DNA molecules needed to fill a selected region of a picotiterplate. We gathered most of the reference genome extension with uniform coverage. We compared the DS method with MDA applied to the same amount of starting DNA. As expected, MDA yielded a sparse and biased read distribution, with a very high amount of unassigned and unspecific DNA amplifications. The optimized DS protocol allows unbiased sequencing to be performed from samples with a very small amount of DNA.This work was funded by grant CP09/00049 Miguel Servet, Instituto de Salud Carlos III, Spain to GD; by projects SAF2009-13032-C02-01 and SAF 2012-31187 (AM), BFU2009-12895-CO2-01 and SAF2010-16240 (FC) from the Spanish Ministry for Science and Innovation (MCINN), FU2008-04501-E from Spanish Ministry for Science and Innovation(MCINN) in the frame of ERA-Net PathoGenoMics and Prometeo/2009/092 from Conselleria D’Educació Generalitat Valenciana,Spain, to AM. MD is recipient of a fellowship from Spanish Ministry of Education FPU2010. MGG was supported by a predoctoral fellowship from the Spanish Ministry of Science and Innovation (Grant number BES-2008-006029

    Nucleotide base frequencies at the 5′ end of the <i>Myotragus</i> human contaminants, treated with a depurinating agent, bleach.

    No full text
    <p>The base composition is plotted as a function of distance from the 5′-end. Despite the small sample size (N = 337 reads), the pattern matches that previously described in ancient endogenous sequences, including Neandertals.</p

    Entropy at the 5′ end of the <i>Myotragus</i>, the human contaminants in the lynx, the Neolithic and the Neandertal reads, estimated using Shannon equation and 100 bootstraps.

    No full text
    <p>It can be seen that in <i>Myotragus</i> and Neolithic the entropy drops at the breaking point, indicating that sequences are not randomly fragmented, while in lynx and Neandertal, the entropy is stable (in the latter this is due to the small sample size of the Neandertal reads available).</p

    Specimens subjected to 454-FLX pyrosequencing, number of reads and ratio of endogenous and contaminant sequences obtained.

    No full text
    <p>*: in the case of human samples, it is impossible to discern <i>a priori</i> which sequences are endogenous and which are human contaminants. However, we have mitochondrial DNA estimates of the maximum potential contamination <1% in the Neandertal sample and <5% in the Neolithic sample.</p

    Nucleotide base frequencies at the 5′ end of the human Neolithic sequences.

    No full text
    <p>The base composition is plotted as a function of distance from the 5′-end. The depurination-based pattern can be seen, despite the small sample size (N = 1,117 reads).</p
    corecore