7,166 research outputs found
Special features of RAD Sequencing data:implications for genotyping
Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools
An Intelligent System for Automated DNA Base Calling
An investigation into improving the performance of DNA base calling algorithms was conducted. The results have shown that the preprocessing steps performed by ABI sequencer on raw data adversely affects the accuracy of DNA sequencing. This adverse effect has been responsible for relatively high error rates, between 3.5% to 6%, in both ABI and Phred sequencing software. Please note that Phred also uses the processed data generated by ABI sequencer; only their base-calling algorithm is different. To remedy this effect, we have developed and implemented a new filtering technique that preserves the initial information contained in the raw data. This provides qualitatively superior data for the future base calling step. Our proposed filtering step provides mechanical shift compensation, cross-talk filtering, and baseline adjustment. These have been briefly described below. Application of our filtering step on a limited number of DNA data has provided sequences with lower error rate
Accurate DNA Base Caller
The major goal of this project is to develop a new base calling technique that will improve the efficiency of the DNA sequencing process. This will be achieved by increasing the average length of error-free sequencing and enhancing the base identification process at the beginning and end of sequences. This will increase sequencing throughput and reduce the cost of DNA sequencing. Previous work by the PI has demonstrated the ability to extend the error-free read by 30%. This was achieved through work on cross-talk filtering, baseline adjustment, base-spacing prediction and development of a fuzzy base-calling algorithm. Further adaptive capabilities as well as full development and implementation of the methodology is planned. The software will be tested on a large number of DNA sequences and remove specific hardware and operating system requirements, as well as be exploitable over the web. Accurate, inexpensive genomic DNA sequencing will be a cornerstone of 21st century biology
Adaptation to high ethanol reveals complex evolutionary pathways
Tolerance to high levels of ethanol is an ecologically and industrially relevant phenotype of microbes, but the molecular mechanisms underlying this complex trait remain largely unknown. Here, we use long-term experimental evolution of isogenic yeast populations of different initial ploidy to study adaptation to increasing levels of ethanol. Whole-genome sequencing of more than 30 evolved populations and over 100 adapted clones isolated throughout this two-year evolution experiment revealed how a complex interplay of de novo single nucleotide mutations, copy number variation, ploidy changes, mutator phenotypes, and clonal interference led to a significant increase in ethanol tolerance. Although the specific mutations differ between different evolved lineages, application of a novel computational pipeline, PheNetic, revealed that many mutations target functional modules involved in stress response, cell cycle regulation, DNA repair and respiration. Measuring the fitness effects of selected mutations introduced in non-evolved ethanol-sensitive cells revealed several adaptive mutations that had previously not been implicated in ethanol tolerance, including mutations in PRT1, VPS70 and MEX67. Interestingly, variation in VPS70 was recently identified as a QTL for ethanol tolerance in an industrial bio-ethanol strain. Taken together, our results show how, in contrast to adaptation to some other stresses, adaptation to a continuous complex and severe stress involves interplay of different evolutionary mechanisms. In addition, our study reveals functional modules involved in ethanol resistance and identifies several mutations that could help to improve the ethanol tolerance of industrial yeasts
Use of high throughput sequencing to observe genome dynamics at a single cell level
With the development of high throughput sequencing technology, it becomes
possible to directly analyze mutation distribution in a genome-wide fashion,
dissociating mutation rate measurements from the traditional underlying
assumptions. Here, we sequenced several genomes of Escherichia coli from
colonies obtained after chemical mutagenesis and observed a strikingly
nonrandom distribution of the induced mutations. These include long stretches
of exclusively G to A or C to T transitions along the genome and orders of
magnitude intra- and inter-genomic differences in mutation density. Whereas
most of these observations can be explained by the known features of enzymatic
processes, the others could reflect stochasticity in the molecular processes at
the single-cell level. Our results demonstrate how analysis of the molecular
records left in the genomes of the descendants of an individual mutagenized
cell allows for genome-scale observations of fixation and segregation of
mutations, as well as recombination events, in the single genome of their
progenitor.Comment: 22 pages, 9 figures (including 5 supplementary), one tabl
The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.
Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets
Recommended from our members
Microarray image processing: A novel neural network framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Due to the vast success of bioengineering techniques, a series of large-scale analysis tools has been developed to discover the functional organization of cells. Among them, cDNA microarray has emerged as a powerful technology that enables biologists to cDNA microarray technology has enabled biologists to study thousands of genes simultaneously within an entire organism, and thus obtain a better understanding of the gene interaction and regulation mechanisms involved. Although microarray technology has been developed so as to offer high tolerances, there exists high signal irregularity through the surface of the microarray image. The imperfection in the microarray image generation process causes noises of many types, which contaminate the resulting image. These errors and noises will propagate down through, and can significantly affect, all subsequent processing and analysis. Therefore, to realize the potential of such technology it is crucial to obtain high quality image data that would indeed reflect the underlying biology in the samples. One of the key steps in extracting information from a microarray image is segmentation: identifying which pixels within an image represent which gene. This area of spotted microarray image analysis has received relatively little attention relative to the advances in proceeding analysis stages. But, the lack of advanced image analysis, including the segmentation, results in sub-optimal data being used in all downstream analysis methods.
Although there is recently much research on microarray image analysis with many methods have been proposed, some methods produce better results than others. In general, the most effective approaches require considerable run time (processing) power to process an entire image. Furthermore, there has been little progress on developing sufficiently fast yet efficient and effective algorithms the segmentation of the microarray image by using a highly sophisticated framework such as Cellular Neural Networks (CNNs). It is, therefore, the aim of this thesis to investigate and develop novel methods processing microarray images. The goal is to produce results that outperform the currently available approaches in terms of PSNR, k-means and ICC measurements.Aleppo University, Syri
- ā¦