9 research outputs found

    Scrible: Ultra-Accurate Error-Correction of Pooled Sequenced Reads

    Full text link
    Abstract. We recently proposed a novel clone-by-clone protocol for de novo genome sequencing that leverages combinatorial pooling design to overcome the limitations of DNA barcoding when multiplexing a large number of samples on second-generation sequencing instruments. Here we address the problem of correcting the short reads obtained from our sequencing protocol. We introduce a novel algorithm called Scrible that exploits properties of the pooling design to accurately identify/correct sequencing errors and minimize the chance of ā€œover-correctingā€. Exper-imental results on synthetic data on the rice genome demonstrate that our method has much higher accuracy in correcting short reads com-pared to state-of-the-art error-correcting methods. On real data on the barley genome we show that Scrible significantly improves the decoding accuracy of short reads to individual BACs.

    Studies of genetic mosaicism in rare diseases

    Get PDF
    Mosaicism in human genetics refers to an individual harboring two or more genetic compositions, all derived from the same fertilized egg. Common signs of genetic mosaicism are asymmetric growth, skin aberrations or vascular malformations. Each clinical picture is in itself rare, but together mosaic disorders form a growing group of identifiable characteristic abnormalities. Interestingly, several pharmacological treatment possibilities for these conditions have evolved in the last couple of years. In study I, we found mosaic hotspot PIK3CA variants in two patients with ectopic muscles and muscular overgrowth, by performing whole genome sequencing and digital PCR. This adds information about timing of PIK3CA mutagenesis during embryogenesis in correlation to phenotype and confirms the diagnostic entity PIK3CA-related muscular overgrowth with ectopic muscles. In study II, we describe a genetic mechanism in DICER1-related overgrowth. We show that a constitutional DICER1 variant encoding the RNase IIIa domain causes a severe subtype of DICER1 syndrome with intellectual disability, macrocephaly, extensive bilateral lung cysts, early onset of Wilms tumor, and well-differentiated fetal lung adenocarcinoma. This phenotype is similar to, but distinct from, the phenotype reported in two patients with GLOW syndrome caused by mosaic hotspot variants encoding the RNase IIIb domain. In study III, we add knowledge of genotype-phenotype correlations in male focal dermal hypoplasia patients by describing a previously unknown disease-causing variant in a male patient, and by highlighting that focal dermal hypoplasia can be suspected in patients with characteristic limb malformations, such as ectrodactyly, or ocular manifestations, even in the absence of typical skin findings. In study IV, we used droplet digital PCR to analyze blood- and sperm-derived DNA from 87 parents to children with intellectual disability syndromes caused by de novo variants. We found germline mosaicism in two fathers and showed that analysis of blood alone may underestimate germline mosaicism. Taken together, these studies have improved our understanding of methodological approaches in mosaicism diagnostics. In addition, these studies contribute to our understanding of the phenotypic and/or genetic spectrum of PIK3CA-related overgrowth, DICER1-related overgrowth, focal dermal hypoplasia and germline mosaicism in rare diseases

    Annotation of marine eukaryotic genomes

    Get PDF

    Validation and development of sequence-based tools to analyse the human gut virome

    Get PDF
    The gut microbiome is a complex community of microorganisms that interacts closely with the human host and is believed to play an important role in the maintenance of human health. The viral component of this community is referred to as the human gut virome and is dominated by bacteriophage. Bacteriophage are central to microbial ecosystems by facilitating nutrient turnover, horizontal gene transfer and driving bacterial diversity. In this way the gut virome is believed to closely interact with the human host by shaping the composition and function of the gut microbiome. However, the gut virome also represents one of the biggest gaps in our understanding of the microbiome as it is dominated by unknown bacteriophage targeting unknown bacterial hosts and with uncharacterised downstream functions. These challenges mean that virome research relies heavily on sequence-based approaches and metagenomics to identify compositional patterns and targets for future characterisation. A typical virome study involves physical and chemical separation of individual virions from the cellular components of the microbiome and the contents of the faecal, luminal or mucosal sample from which it came. A viral metagenome is then generated by extracting virome DNA and/or RNA for sequencing on a given platform. These sequencing reads are then quality filtered and assembled to reconstruct the viral genomes in the original sample. The abundance of these assemblies is then estimated by aligning the sequencing reads and performing statistical analysis. However, each step in a virome analysis pipeline has the potential to distort the final viral community and given the unknown nature of the virome, this distortion is difficult to identify and characterise. As a result, conclusions are often drawn from virome studies without fully appreciating the impact of the analysis methods on the findings. This thesis examines the major steps in sequence-based virome analysis pipelines, highlighting how choices made at each step of an analysis protocol can impact the final conclusions drawn from a study. In doing so, we have changed our perspective of the human gut virome and challenged previous assumptions. Chapter One discusses the current understanding of the virome field, giving particular attention to how the analysis methods and challenges affect our view of the virome. In Chapter Two, we focus on the assembly step of virome analysis pipelines. This step is of particular importance to virome studies, as an assemblerā€™s ability to recover viral sequences can ultimately determine the amount of sequence information used in a that study. We compared all short-read assembly programs used in virome studies to date, across mock communities, simulated and real datasets. We found that not all assemblers are equal, and choice of assembler can drastically affect the conclusions that can be drawn from a virome study. These findings call the comparability of different virome studies into question and would suggest that previous virome studies would benefit from reanalysis using improved assembly methods and re-examination of the conclusions drawn. As discussed, the human gut virome is dominated by ā€œviral dark matterā€; those sequences which do not share homology to reference databases. However, the majority of what is currently known about the virome in human health and disease is based on the minor fraction of viral sequences collated in these databases. This presents a serious gap in our understanding and was the primary focus of Chapter Three. We reanalysed a keystone inflammatory bowel disease (IBD) dataset, which had formed the foundation of much of what we knew about the virome in IBD. We developed a new approach to analysing the virome beyond the identifiable minority and by doing so, changed our understanding of the virome in IBD significantly. In the final chapter, we directed our attention to possibly the most important aspect of a sequence-based study, the sequencing approach itself. This step bridges the gap between the biological information in a virome and the digital information that is analysed. As with all steps in a virome analysis pipeline, this has serious implications for the final conclusions of the study. We described the use of long-read sequencing in the human gut virome and the benefits and challenges which are associated with this technology. We also found the ability of amplified short-read sequencing libraries to represent the gut virome was limited, but that alternative library preparation methods and long-read sequencing platforms may be able to address these limitations. These findings imply that much of what we know about that human gut virome may be linked to sequencing performance, rather than the biology of the community itself. These three major aspects of virome analysis pipelines highlight the importance of considering the impact of the analysis approach when interpreting the results of virome data and complex biological systems in general

    Directed evolution of wine-related lactic acid bacteria and characterisation of evolved strains

    Get PDF
    Thesis (PhDAgric)--Stellenbosch University, 2020.ENGLISH ABSTRACT: Microorganisms form part of complex ecological networks, governed by either metabolic, physical or molecular processes that have positive, neutral or negative effects on microbial interactions. Understanding microbial interactions provides the opportunity to control and manipulate microbes for different biotechnological and industrial applications. For example, the production of beverages such as wine shows how microbial interactions can be controlled and manipulated to achieve desired outcomes. One example is the deliberate inoculation of lactic acid bacteria (LAB) such as Oenococcus oeni or Lactobacillus plantarum to inhibit the growth of spoilage bacteria by depleting available carbon sources such as L-malic acid in a process known as malolactic fermentation (MLF). Indeed, wine provides a good model to study microbial interactions because grape must is inhabited by multiple species of filamentous fungi, yeast, acetic acid bacteria (AAB) and LAB in an anthropogenic and relatively controlled environment. In this study, I investigated the impact of the interaction between the wine yeast Saccharomyces cerevisiae and the LAB L. plantarum. Briefly, the impact of the yeast on the evolution of the bacteria was evaluated after 50 and 100 generations first phenotypically, followed by a genome-wide analysis to identify genetic targets of evolution. A serial transfer method was used for the directed evolution (DE) experiments, introducing bottlenecks and fluctuation between nutrient rich and poor environments after each transfer. This strategy results in a ā€˜feast-and-famineā€™ regime, which results in conflicting selective pressures, resembling what normally occurs in dynamic natural environments, which was important here to generate robust and resilient bacteria. Additionally, two yeast strains were used to investigate whether microbial interactions result in yeast-specific adaptations or generic adaptations. Therefore, the yeast strains were kept constant by discarding the yeast at the end of each DE cycle and re-inoculating the mother culture at the start of each DE cycle. The data show yeast strain-specific phenotypes for isolates evolved for 50 generations. Genome-wide analysis showed that broadly targeted pathways are peptidoglycan biosynthesis and degradation, nucleic acid processing, and carbohydrate transport and metabolism in isolates evolved for 50 and 100 generations. These data show that yeast-driven DE results in yeast-specific phenotypic variations and high genetic diversity, but also in convergent evolution over time. The results obtained in this study suggest that yeast drive the evolution of bacteria by dominating the metabolic landscape, showing that strong competitive interactions promote positive selection in mixed species communities, and weak competitive interactions results in no adaptation. This work enriches our understanding of yeast-bacteria interactions over time. Moreover, an isolate that is superior to the parent strain in terms of growth and MLF was obtained, showing potential as a starter culture for winemaking.AFRIKAANSE OPSOMMING: Mikroƶrganismes maak deel uit van komplekse ekologiese netwerke wat deur metaboliese, fisiese of molekulĆŖre prosesse beheer word, en dit het positiewe, neutrale of negatiewe effekte op mikrobiese interaksies. Insig in mikrobiese interaksies bied die geleentheid om mikrobes vir verskillende biotegnologiese en nywerheidstoepassings te kontroleer en te manipuleer. Die produksie van drinkgoed soos wyn toon byvoorbeeld hoe mikrobiese interaksies beheer en gemanipuleer kan word om die gewenste uitkomste te bereik. Een voorbeeld is die doelbewuste inenting van melksuurbakterieĆ« (LAB) soos Oenococcus oeni of Lactobacillus plantarum om die groei van bederfbakterieĆ« te belemmer deur beskikbare koolstofbronne soos L appelsuur in ā€™n proses genaamd malolaktiese fermentasie (MLF) te verarm. Wyn verskaf inderwaarheid ā€™n goeie model vir die bestudering van mikrobiese interaksies, aangesien daar verskeie spesies filamentagtige swamme, gis, asynsuurbakterieĆ« (AAB) en LAB in ā€™n antropogeniese en relatief beheerde omgewing in druiwemos voorkom. In hierdie studie het ek die impak van die wisselwerking tussen die wyngis Saccharomyces cerevisiae en die LAB L. plantarum ondersoek. Kortliks is die invloed van die gis op die evolusie van die bakterieĆ« eers nĆ” 50 en 100 generasies fenotipies geĆ«valueer, gevolg deur ā€™n genoomwye ontleding om genetiese teikens vir evolusie te identifiseer. ā€™n Reeksoordragmetode is vir die gerigte evolusie- (DE)-eksperimente gebruik, wat knelpunte en fluktuasie tussen voedingsryke en swak omgewings nĆ” elke oordrag ingevoer het. Hierdie strategie het tot ā€™n ā€œfees en hongersnoodā€ regime gelei, met gevolglike teenstrydige selektiewe druk en voorkomste wat normaalweg in dinamiese natuurlike omgewings aangeneem word; hier belangrik vir die generering van robuuste en veerkragtige bakterieĆ«. Daarbenewens is twee gisstamme gebruik om te vas te stel of mikrobiese interaksies gisspesifieke aanpassings of generiese aanpassings tot gevolg het. Daarom is die gisstamme konstant gehou deur die gis aan die einde van elke DE-siklus weg te gooi en die moederkultuur opnuut aan die begin van elke DE-siklus in te ent. Die data dui daarop dat gisstam spesifieke fenotipes vir isolate oor 50 generasies heen ontwikkel het. Genoomwye ontledings toon die breedweg geteikende roetes omvat peptidoglikaanse biosintese en afbreking, nukleĆÆensuurprosessering, asook koolhidraatvervoer en metabolisme in isolate, wat oor 50 en 100 generasies ontwikkel het. Hierdie data toon verder dat gisgedrewe DE tot gisspesifieke fenotipiese variasies en hoĆ« genetiese diversiteit, ingesluit konvergente evolusie, oor tyd aanleiding gee. Die resultate wat in hierdie studie verkry is, dui daarop dat gis die evolusie van bakterieĆ« dryf deur die metaboliese landskap te oorheers, wat wys dat sterk mededingende interaksies positiewe seleksie in gemengde spesiegemeenskappe aanmoedig, terwyl swak mededingende interaksies geen aanpassing tot gevolg het nie. Hierdie werk verryk ons begrip van gisbakterie interaksies oor tyd. Daarbenewens is ā€™n isolaat verkry wat beter as die ouerstam is sover dit groei en MLF betref, en oor die potensiaal beskik om as ā€™n aansitkultuur vir wynmaak te dien.Doctora

    De novo meta-assembly of ultra-deep sequencing data.

    Get PDF
    UnlabelledWe introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized 'slices' and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Slicembler uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. To improve its efficiency, Slicembler uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8000x coverage) and simulated data show that Slicembler significantly improves the quality of the assembly compared with the performance of the base assembler. In fact, most of the times, Slicembler generates error-free assemblies. We also show that Slicembler is much more resistant against high sequencing error rate than the base assembler.Availability and implementationSlicembler can be accessed at http://slicembler.cs.ucr.edu/

    De novo meta-assembly of ultra-deep sequencing data

    No full text

    Life in the nucleus, the genomic basis of energy exploitation by intranuclear microsporidia

    Get PDF
    The Microsporidia are obligate intracellular parasites that have jettisoned oxidation phosphorylative capabilities during their early evolutionary history and so rely on ATP import from their host and glycolysis for their energy needs. Some species form tight associations with the hostā€™s mitochondria and this is thought to facilitate ATP sequestration by the developing intracellular microsporidian. The human parasite, Enterocytozoon bieneusi has however lost glycolytic capabilities and may rely entirely on ATP import from its host for energy. E. bieneusi belongs to the Enterocytozoonidae microsporidian family and recent rDNA-based phylogenetic studies have suggested it has close evolutionary ties with Enterospora canceri, a crab-infecting intranuclear parasite. Such a close evolutionary relationship implied that glycolysis might also be absent in the intranuclear parasite raising questions as to how this parasite obtains energy from its unusual niche that is physically walled off from the host mitochondria, the main source of ATP in the host cell. In this study, draft genomes of four species of the Enterocytozoonidae namely, Ent. canceri, E. hepatopenaei, Hepatospora eriocheir and Hepatospora eriocheir canceri and one non-Enterocytozoonidae species, Thelohania sp. were assembled and annotated (The genome assembly of Hepatospora eriocheir was provided by Dr. Bryony Williams). Phylogenomics performed with this and publicly available genomic data confirmed the close evolutionary ties between Ent. canceri and E. bieneusi. Comparative genomic analyses also revealed that glycolysis is indeed lost in all members of the Enterocytozoonidae family sequenced in this study, hinting to the relaxation of evolutionary pressures to maintain this pathway at the base of this microsporidian family. Despite this absence, the hexokinase gene was retained in all aglycolytic genomes analysed, and that of Ent. canceri was fused to a PTPA gene. Functional assays and yeast complementation assays suggest that this chimera is able to recognise glucose as a substrate but the heterologously expressed homolog of H. eriocheir cannot. Finally, phylogenomics have been used here to demonstrate that despite the morphological differences between three Hepatospora-like organisms parasitizing different crab hosts, they are the same species. This finding adds more weight to current evidence suggesting that morphology is not an ideal marker for taxonomical classification in the Microsporidia.University of Exeter Centre for Environment Fisheries and Aquaculture Scienc
    corecore