7,166 research outputs found

    Special features of RAD Sequencing data:implications for genotyping

    Get PDF
    Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools

    An Intelligent System for Automated DNA Base Calling

    Get PDF
    An investigation into improving the performance of DNA base calling algorithms was conducted. The results have shown that the preprocessing steps performed by ABI sequencer on raw data adversely affects the accuracy of DNA sequencing. This adverse effect has been responsible for relatively high error rates, between 3.5% to 6%, in both ABI and Phred sequencing software. Please note that Phred also uses the processed data generated by ABI sequencer; only their base-calling algorithm is different. To remedy this effect, we have developed and implemented a new filtering technique that preserves the initial information contained in the raw data. This provides qualitatively superior data for the future base calling step. Our proposed filtering step provides mechanical shift compensation, cross-talk filtering, and baseline adjustment. These have been briefly described below. Application of our filtering step on a limited number of DNA data has provided sequences with lower error rate

    Accurate DNA Base Caller

    Get PDF
    The major goal of this project is to develop a new base calling technique that will improve the efficiency of the DNA sequencing process. This will be achieved by increasing the average length of error-free sequencing and enhancing the base identification process at the beginning and end of sequences. This will increase sequencing throughput and reduce the cost of DNA sequencing. Previous work by the PI has demonstrated the ability to extend the error-free read by 30%. This was achieved through work on cross-talk filtering, baseline adjustment, base-spacing prediction and development of a fuzzy base-calling algorithm. Further adaptive capabilities as well as full development and implementation of the methodology is planned. The software will be tested on a large number of DNA sequences and remove specific hardware and operating system requirements, as well as be exploitable over the web. Accurate, inexpensive genomic DNA sequencing will be a cornerstone of 21st century biology

    Adaptation to high ethanol reveals complex evolutionary pathways

    Get PDF
    Tolerance to high levels of ethanol is an ecologically and industrially relevant phenotype of microbes, but the molecular mechanisms underlying this complex trait remain largely unknown. Here, we use long-term experimental evolution of isogenic yeast populations of different initial ploidy to study adaptation to increasing levels of ethanol. Whole-genome sequencing of more than 30 evolved populations and over 100 adapted clones isolated throughout this two-year evolution experiment revealed how a complex interplay of de novo single nucleotide mutations, copy number variation, ploidy changes, mutator phenotypes, and clonal interference led to a significant increase in ethanol tolerance. Although the specific mutations differ between different evolved lineages, application of a novel computational pipeline, PheNetic, revealed that many mutations target functional modules involved in stress response, cell cycle regulation, DNA repair and respiration. Measuring the fitness effects of selected mutations introduced in non-evolved ethanol-sensitive cells revealed several adaptive mutations that had previously not been implicated in ethanol tolerance, including mutations in PRT1, VPS70 and MEX67. Interestingly, variation in VPS70 was recently identified as a QTL for ethanol tolerance in an industrial bio-ethanol strain. Taken together, our results show how, in contrast to adaptation to some other stresses, adaptation to a continuous complex and severe stress involves interplay of different evolutionary mechanisms. In addition, our study reveals functional modules involved in ethanol resistance and identifies several mutations that could help to improve the ethanol tolerance of industrial yeasts

    Use of high throughput sequencing to observe genome dynamics at a single cell level

    Full text link
    With the development of high throughput sequencing technology, it becomes possible to directly analyze mutation distribution in a genome-wide fashion, dissociating mutation rate measurements from the traditional underlying assumptions. Here, we sequenced several genomes of Escherichia coli from colonies obtained after chemical mutagenesis and observed a strikingly nonrandom distribution of the induced mutations. These include long stretches of exclusively G to A or C to T transitions along the genome and orders of magnitude intra- and inter-genomic differences in mutation density. Whereas most of these observations can be explained by the known features of enzymatic processes, the others could reflect stochasticity in the molecular processes at the single-cell level. Our results demonstrate how analysis of the molecular records left in the genomes of the descendants of an individual mutagenized cell allows for genome-scale observations of fixation and segregation of mutations, as well as recombination events, in the single genome of their progenitor.Comment: 22 pages, 9 figures (including 5 supplementary), one tabl

    The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    Get PDF
    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets

    Low-frequency variant detection in viral populations using massively parallel sequencing data

    Get PDF
    • ā€¦
    corecore