9 research outputs found

    Spherical:an iterative workflow for assembling metagenomic datasets

    Get PDF
    BACKGROUND: The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. RESULTS: We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. CONCLUSIONS: We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P?<?0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .publishersversionPeer reviewe

    Evaluation of the impact of Illumina error correction tools on de novo genome assembly

    Get PDF
    BACKGROUND : Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods. RESULTS : For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy. CONCLUSIONS : We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools.Additional file 1: Supplementary Data. Evaluation of the impact of Illumina error correction tools on de novo genome assembly.The Research Foundation - Flanders (FWO) (G0C3914N)http://www.biomedcentral.com/bmcbioinformaticsam2017Genetic

    Investigating Lipid and Secondary Metabolisms in Plants by Next-generation Sequencing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Error correction of Illumina sequencing data

    Get PDF

    Novel computational techniques for mapping and classifying Next-Generation Sequencing data

    Get PDF
    Since their emergence around 2006, Next-Generation Sequencing technologies have been revolutionizing biological and medical research. Quickly obtaining an extensive amount of short or long reads of DNA sequence from almost any biological sample enables detecting genomic variants, revealing the composition of species in a metagenome, deciphering cancer biology, decoding the evolution of living or extinct species, or understanding human migration patterns and human history in general. The pace at which the throughput of sequencing technologies is increasing surpasses the growth of storage and computer capacities, which creates new computational challenges in NGS data processing. In this thesis, we present novel computational techniques for read mapping and taxonomic classification. With more than a hundred of published mappers, read mapping might be considered fully solved. However, the vast majority of mappers follow the same paradigm and only little attention has been paid to non-standard mapping approaches. Here, we propound the so-called dynamic mapping that we show to significantly improve the resulting alignments compared to traditional mapping approaches. Dynamic mapping is based on exploiting the information from previously computed alignments, helping to improve the mapping of subsequent reads. We provide the first comprehensive overview of this method and demonstrate its qualities using Dynamic Mapping Simulator, a pipeline that compares various dynamic mapping scenarios to static mapping and iterative referencing. An important component of a dynamic mapper is an online consensus caller, i.e., a program collecting alignment statistics and guiding updates of the reference in the online fashion. We provide Ococo, the first online consensus caller that implements a smart statistics for individual genomic positions using compact bit counters. Beyond its application to dynamic mapping, Ococo can be employed as an online SNP caller in various analysis pipelines, enabling SNP calling from a stream without saving the alignments on disk. Metagenomic classification of NGS reads is another major topic studied in the thesis. Having a database with thousands of reference genomes placed on a taxonomic tree, the task is to rapidly assign a huge amount of NGS reads to tree nodes, and possibly estimate the relative abundance of involved species. In this thesis, we propose improved computational techniques for this task. In a series of experiments, we show that spaced seeds consistently improve the classification accuracy. We provide Seed-Kraken, a spaced seed extension of Kraken, the most popular classifier at present. Furthermore, we suggest ProPhyle, a new indexing strategy based on a BWT-index, obtaining a much smaller and more informative index compared to Kraken. We provide a modified version of BWA that improves the BWT-index for a quick k-mer look-up

    Going viral : an integrated view on virological data analysis from basic research to clinical applications

    Get PDF
    Viruses are of considerable interest for several fields of life science research. The genomic richness of these entities, their environmen- tal abundance, as well as their high adaptability and, potentially, pathogenicity make treatment of viral diseases challenging. This thesis proposes three novel contributions to antiviral research that each concern analysis procedures of high-throughput experimen- tal genomics data. First, a sensitive approach for detecting viral genomes and transcripts in sequencing data of human cancers is presented that improves upon prior approaches by allowing de- tection of viral nucleotide sequences that consist of human-viral homologs or are diverged from known reference sequences. Sec- ond, a computational method for inferring physical protein contacts from experimental protein complex purification assays is put for- ward that allows statistically meaningful integration of multiple data sets and is able to infer protein contacts of transiently binding protein classes such as kinases and molecular chaperones. Third, an investigation of minute changes in viral genomic populations upon treatment of patients with the mutagen ribavirin is presented that first characterizes the mutagenic effect of this drug on the hepatitis C virus based on deep sequencing data.Viren sind von beträchtlichem Interesse für die biowissenschaftliche Forschung. Der genetische Reichtum, die hohe Vielfalt, wie auch die Anpassungsfähigkeit und mögliche Pathogenität dieser Organismen erschwert die Behandlung von viralen Erkrankungen. Diese Promotionsschrift enthält drei neuartige Beiträge zur antiviralen Forschung welche die Analyse von experimentellen Hochdurchsatzdaten der Genomik betreffen: erstens, ein sensitiver Ansatz zur Entdeckung viraler Genome und Transkripte in Sequenzdaten humaner Karzinome, der die Identifikation von viralen Nukleotidsequenzen ermöglicht, die von Referenzgenomen ab- weichen oder homolog zu humanen Faktoren sind. Zweitens, eine computergestützte Methode um physische Proteinkontakte von experimentellen Proteinkomplex-Purifikationsdaten abzuleiten welche die statistische Integration von mehreren Datensätzen erlaubt um insbesondere Proteinkontakte von flüchtig interagierenden Proteinklassen wie etwa Kinasen und Chaperonen aus den Daten ableiten zu können. Drittens, eine Untersuchung von kleinsten Änderungen viraler Genompopulationen während der Behandlung von Patienten mit dem Mutagen ribavirin die zum ersten Mal die mutagene Wirkung dieses Medikaments auf das Hepatitis C Virus mittels Tiefensequenzdaten nachweist

    History of Construction Cultures Volume 1

    Get PDF
    History of Construction Cultures Volume 1 contains papers presented at the 7ICCH – Seventh International Congress on Construction History, held at the Lisbon School of Architecture, Portugal, from 12 to 16 July, 2021. The conference has been organized by the Lisbon School of Architecture (FAUL), NOVA School of Social Sciences and Humanities, the Portuguese Society for Construction History Studies and the University of the Azores. The contributions cover the wide interdisciplinary spectrum of Construction History and consist on the most recent advances in theory and practical case studies analysis, following themes such as: - epistemological issues; - building actors; - building materials; - building machines, tools and equipment; - construction processes; - building services and techniques ; -structural theory and analysis ; - political, social and economic aspects; - knowledge transfer and cultural translation of construction cultures. Furthermore, papers presented at thematic sessions aim at covering important problematics, historical periods and different regions of the globe, opening new directions for Construction History research. We are what we build and how we build; thus, the study of Construction History is now more than ever at the centre of current debates as to the shape of a sustainable future for humankind. Therefore, History of Construction Cultures is a critical and indispensable work to expand our understanding of the ways in which everyday building activities have been perceived and experienced in different cultures, from ancient times to our century and all over the world
    corecore