3,345 research outputs found

    Mycoplasma Contamination in The 1000 Genomes Project

    Get PDF
    Background: In silco Biology is increasingly important and is often based on public datasets. While the problem of contamination is well recognised in microbiology labs the corresponding problem of database corruption has received less attention. Results: Mapping 50 billion next generation DNA sequences from The Thousand Genome Project against published genomes reveals many that match one or more Mycoplasma but are not included in the reference human genome GRCh37.p5. Many of these are of low quality but NCBI BLAST searches confirm some high quality, high entropy sequences match Mycoplasma but no human sequences. Conclusions: It appears at least 7percent of 1000G samples are contaminated

    Bacterial Contamination in Public ATAC-Seq Data and Alignment-Free Detection Methods

    Get PDF
    ATAC-seq is a new high-throughput sequencing technology for measuring chromatin accessibility within genomic samples. It can be used to discover new information about open regions, nucleosome positions, transcription factor binding sites, and DNA methylation. It is especially useful when combined with other next-generation sequencing techniques, such as RNA-seq. Unlike previous technologies, however, ATAC-seq is more sensitive to bacterial contamination, which is a well-known problem in cell cultures that can lead to incorrect experimental results. Previous studies have measured the contamination in public RNA-seq data and found that 5%- 10% of samples were contaminated. In this report, we investigate the prevalence of contamination in ATAC-seq samples, rather than RNA-seq data, uploaded to the Sequence Read Archive using two popular alignment-based tools: Bowtie 2 and Kraken 2. We then develop an alignment-free method of detection using machine learning and a novel method of estimating DNA fragment lengths from paired-end ATAC-seq data. Our results show that around 5% of ATAC-seq samples are contaminated and our machine learning method is able to correctly classify 97% of samples as contaminated or not while using less computational resources than the alignment-based tools. Thus, our method shows promise as a preliminary rapid screening tool for contamination in labs with limited access huge to computational resources

    Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks.

    Get PDF
    Genetic studies are increasingly based on short noisy next generation scanners. Typically complete DNA sequences are assembled by matching short NextGen sequences against reference genomes. Despite considerable algorithmic gains since the turn of the millennium, matching both single ended and paired end strings to a reference remains computationally demanding. Further tailoring Bioinformatics tools to each new task or scanner remains highly skilled and labour intensive. With this in mind, we recently demonstrated a genetic programming based automated technique which generated a version of the state-of-the-art alignment tool Bowtie2 which was considerably faster on short sequences produced by a scanner at the Broad Institute and released as part of The Thousand Genome Project

    Formation and Repair of Environmentally-induced damage to Mitochondrial and Nuclear Genomes

    Get PDF
    All living organisms are continually exposed to various environmental stressors, be they anthropogenic or natural in origin. Many stressors share a common toxic mechanism, generating highly reactive chemical species that can have detrimental effects. A proportion of these chemical species evade the cell’s defenses and damage cellular components, including DNA. Measurement of global genome levels of DNA and cellular damage has implicated environmental stressors in major human health issues (e.g., cancer, aging, cardiovascular, and neurodegenerative diseases). Current associations between DNA damage and disease are based upon crude assessments of global genome damage, which provide limited mechanistic information on how damage leads to disease. Furthermore, DNA damage is not uniformly distributed across the genome; accumulation, or persistence, of damage in regions of the genome vital to the functioning of the cell, will have downstream consequences. Thus, we proposed, that the role of DNA damage in disease can only be understood by examination of damage in the context of its location and damage response during the repair. Currently we lack information concerning how the cell maintains baseline levels of DNA damage and its spatio-temporal distribution across the genome. This is fundamental to our understanding of how the cell responds to damage and whether regions are targeted for prioritized repair. In this study, we have refined a popular method of assessing global DNA damage, to increase the efficiency of the assay, mapped UV-induced T\u3c\u3eT across both nuclear and mitochondrial genomes, and evaluated the changes in the cellular DNA damage response during bacterial infection. The aim of these studies was to evaluate the importance of cellular response to exposure to environmental toxicants

    Genome-resolved metagenomics suggests a mutualistic relationship between Mycoplasma and salmonid hosts

    Get PDF
    Salmonids are important sources of protein for a large proportion of the human population. Mycoplasma species are a major constituent of the gut microbiota of salmonids, often representing the majority of microbiota. Despite the frequent reported dominance of salmonid-related Mycoplasma species, little is known about the phylogenomic placement, functions and potential evolutionary relationships with their salmonid hosts. In this study, we utilise 2.9 billion metagenomic reads generated from 12 samples from three different salmonid host species to I) characterise and curate the first metagenome-assembled genomes (MAGs) of Mycoplasma dominating the intestines of three different salmonid species, II) establish the phylogeny of these salmonid candidate Mycoplasma species, III) perform a comprehensive pangenomic analysis of Mycoplasma, IV) decipher the putative functionalities of the salmonid MAGs and reveal specific functions expected to benefit the host. Our data provide a basis for future studies examining the composition and function of the salmonid microbiota

    Synthetic biology: Novel approaches for microbiology

    Get PDF
    In the past twenty years, molecular genetics has created powerful tools for genetic manipulation of living organisms. Whole genome sequencing has provided necessary information to assess knowledge on gene function and protein networks. In addition, new tools permit to modify organisms to perform desired tasks. Gene function analysis is speed up by novel approaches that couple both high throughput data generation and mining. Synthetic biology is an emerging field that uses tools for generating novel gene networks, whole genome synthesis and engineering. New applications in biotechnological, pharmaceutical and biomedical research are envisioned for synthetic biology. In recent years these new strategies have opened up the possibilities to study gene and genome editing, creation of novel tools for functional studies in virus, parasites and pathogenic bacteria. There is also the possibility to re-design organisms to generate vaccine subunits or produce new pharmaceuticals to combat multi-drug resistant pathogens. In this review we provide our opinion on the applicability of synthetic biology strategies for functional studies of pathogenic organisms and some applications such as genome editing and gene network studies to further comprehend virulence factors and determinants in pathogenic organisms. We also discuss what we consider important ethical issues for this field of molecular biology, especially for potential misuse of the new technologies. [Int Microbiol 2015; 18(2):71-84]Keywords: synthetic biology · genetic engineering · genomics · pathogenesis · bioethics · artificial cells · astrobiolog

    Doctor of Philosophy

    Get PDF
    dissertationAdvances in technology have produced efficient and powerful scientific instruments for measuring biological phenomena. In particular, modern microscopes and nextgeneration sequencing machines produce data at such a rate that manual analysis is no longer practical or feasible for meaningful scientific inquiries. Thus, there is a great need for computational strategies to organize and analyze huge amounts of data produced by biological experiments. My work presents computational strategies and software solutions for application in image analysis, human variant prioritization, and metagenomics. The information content of images can be leveraged to answer an extremely broad spectrum of questions ranging from inquiries about basic biological processes to highly specific, application-driven inquiries like the efficacy of a pharmaceutical drug. Modern microscopes can produce images at a rate at which rigorous manual analysis is impossible. I have created software pipelines that automate image analysis in two specific applications domains. In addition, I discuss general image analysis strategies that can be applied to a wide variety of problems. There are tens of millions of known human genetic variants. Prioritizing human variants based on how likely they are to cause disease is of huge importance because of the potential impact on human health. Current variant prioritization methods are limited by their scope, efficiency, and accuracy. I present a variant prioritization method, the VAAST variant prioritizer, which is superior in its scope, efficiency, and accuracy to existing variant prioritization methods. The rise of next-generation sequencing enables huge quantities of sequence to be generated in a short period of time. No field of study has been affected by rapid sequencing more than metagenomics. Metagenomics, the genomic analysis of a population v of microorganisms, has important implications for pathogen detection because metagenomics enables the culture-free detection of microorganisms. I have created Taxonomer, a comprehensive metagenomics pipeline that enables the real-time analysis of read datasets derived from environmental samples
    • …
    corecore