5 research outputs found

    An improved filtering algorithm for big read datasets and its application to single-cell assembly

    Get PDF
    Background: For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. Methods: We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. Results: We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. Conclusions: We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm

    An improved filtering algorithm for big read datasets and its application to single-cell assembly

    Get PDF
    Background: For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. Methods: We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. Results: We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. Conclusions: We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm

    Long-term mitigation of drought changes the functional potential and life-strategies of the forest soil microbiome involved in organic matter decomposition

    Get PDF
    Climate change can alter the flow of nutrients and energy through terrestrial ecosystems. Using an inverse climate change field experiment in the central European Alps, we explored how long-term irrigation of a naturally drought-stressed pine forest altered the metabolic potential of the soil microbiome and its ability to decompose lignocellulolytic compounds as a critical ecosystem function. Drought mitigation by a decade of irrigation stimulated profound changes in the functional capacity encoded in the soil microbiome, revealing alterations in carbon and nitrogen metabolism as well as regulatory processes protecting microorganisms from starvation and desiccation. Despite the structural and functional shifts from oligotrophic to copiotrophic microbial lifestyles under irrigation and the observation that different microbial taxa were involved in the degradation of cellulose and lignin as determined by a time-series stable-isotope probing incubation experiment with 13C-labeled substrates, degradation rates of these compounds were not affected by different water availabilities. These findings provide new insights into the impact of precipitation changes on the soil microbiome and associated ecosystem functioning in a drought-prone pine forest and will help to improve our understanding of alterations in biogeochemical cycling under a changing climate

    Novel graph based algorithms for transcriptome sequence analysis

    Get PDF
    RNA-sequencing (RNA-seq) is one of the most-widely used techniques in molecular biology. A key bioinformatics task in any RNA-seq workflow is the assembling the reads. As the size of transcriptomics data sets is constantly increasing, scalable and accurate assembly approaches have to be developed.Here, we propose several approaches to improve assembling of RNA-seq data generated by second-generation sequencing technologies. We demonstrated that the systematic removal of irrelevant reads from a high coverage dataset prior to assembly, reduces runtime and improves the quality of the assembly. Further, we propose a novel RNA-seq assembly work- flow comprised of read error correction, normalization, assembly with informed parameter selection and transcript-level expression computation. In recent years, the popularity of third-generation sequencing technologies in- creased as long reads allow for accurate isoform quantification and gene-fusion detection, which is essential for biomedical research. We present a sequence-to-graph alignment method to detect and to quantify transcripts for third-generation sequencing data. Also, we propose the first gene-fusion prediction tool which is specifically tailored towards long-read data and hence achieves accurate expression estimation even on complex data sets. Moreover, our method predicted experimentally verified fusion events along with some novel events, which can be validated in the future

    Integrated Analysis of the Gut Microbiota and Their Fermentation Products in Mice Treated with the Longevity Enhancing Drug Acarbose

    Full text link
    During the last two decades, the predominant view of the microbial inhabitants of the mammalian digestive system has evolved from passive commensals to important drivers of health and disease. Processes now known to be affected by the gut microbiome include digestion, immune development and regulation, drug metabolism, pathogen resistance, and many more. Discoveries like these have been driven by revolutionary new methods for the untargeted, high-throughput characterization of the genetic and metabolic composition of microbial communities. However, going from these high-dimensional observations to mechanistic understanding is not trivial and is limited by experimental challenges in studying complex communities in realistic environments. The gut microbiome is particularly difficult, given its taxonomic diversity, physical inaccessibility, and intimate interface with host physiology. In this dissertation, I describe several contributions to our understanding of this important ecological system, with a particular focus on the analysis of bacteria and their metabolic roles in situ through the integration of diverse data. The drug acarbose inhibits the breakdown and absorption of starch in the upper digestive system, resulting in increased availability of this polysaccharide in the lower gut. Interestingly, acarbose has been shown in mice to substantially increase lifespan. This work explores the effects in mice of experimental treatment with acarbose on the composition and function of the gut microbiome. Resulting dramatic increases in the abundance of members of the largely uncultivated bacterial family Muribaculaceae are linked to higher concentrations in feces of several short-chain fatty acids—in particular propionate—and these metabolic products of bacterial fermentation are in turn found to be associated with increased mouse lifespan. Furthermore, based on the culture-free reconstruction of bacterial genomes, we propose a metabolic role of Muribaculaceae in the breakdown of starch. Genetic features with homology to the starch utilization system in Bacteroides are identified in specific members of this family, possibly explaining their increased abundance in acarbose treated mice. In addition, for one taxon, two distinct genomic variants are found, predicting differences in physiology that could explain variable response to acarbose across replications of the experiment at multiple study sites. Finally, I develop experimental and analysis methods for measurements of absolute abundance in microbial communities using a recently proposed spike-in quantification approach. A novel, model-based inference procedure harnessing these data is found to outperform other methods in identifying changes in bacterial abundance. This dissertation presents a comprehensive exploration of the dynamics and importance of the gut microbiome in an experimental model with implications for human health. Simultaneously, we develop and refine methods that can be applied to a variety of systems for deriving new understanding about complex microbial communities.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/149818/1/bjsm_1.pd
    corecore