3,483 research outputs found

    Computational Analyses of Metagenomic Data

    Get PDF
    Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic. In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics. The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches. The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation

    From Photosynthesis to Detoxification: Microbial Metabolisms Shape Earth’s Surface Chemistry

    Get PDF
    Earth’s chemistry, through geologic time and in the present, is inextricably linked with biologically mediated reactions. All major elemental cycles on Earth’s surface have arisen from two competing processes – life shaping its chemical environment through the evolution of key biochemical pathways, and the environment constraining metabolism by dictating which reactions will occur. Understanding this complicated interplay motivates the research presented in this thesis, which studies this phenomenon over two major elemental cycles – the modern Nitrogen (N) and ancient Carbon (C) cycle. Chapters One and Two focus on the evolution of ribulose-1,5-bisphosphate carboxylase/oxygenase (rubisco), the enzyme that catalyzes the key carbon fixation step in modern oxygenic photosynthesis. This reaction also imparts a large kinetic isotope effect (KIE) that causes the fixed carbon to be relatively depleted in natural abundance ¹³C compared to its substrate; this isotopic fingerprint can be seen in both the modern C cycle and in rock records recording the ancient C cycle. Therefore, this KIE has been used both in vitro (outside the cell) by biochemical models to rationalize rubisco’s reaction mechanism, and in vivo (in the cell) as a proxy for environmental CO₂ concentrations in the past and present. However, both the in vitro and in vivo measurements are calibrated using modern organisms even though rubisco and oxygenic photosynthesis have undergone profound evolution over geologic time. Therefore, we measured the KIE in vitro and in vivo of a reconstructed ancestral Form IB rubisco dating to &gt;&gt; 1 Ga, and the KIE in vitro of a recently discovered Form I’ rubisco that presents a modern analogue to ancestral Form I rubiscos prior to the evolution of the small subunit. Overall, we find that the KIEs of both rubiscos are smaller than their modern counterparts, which is surprising given that the rock record indicates overall carbon isotope fractionations in vivo are larger in the past. In addition, we find that models strictly based on modern organisms may not apply to the past, questioning the basic assumption that uniformitarianism can be readily applied to biological processes. However, these models can be rescued by accounting for other aspects of cell physiology. Chapter Three focuses on disentangling the source of key metabolites, like nitrous oxide (N₂O) in the modern N cycle. Like Chapters 1 and 2, an isotopic fingerprint that measures the ‘preference’ of ¹⁵N for the central or outer nitrogen site in N₂O (“Site Preference” or “SP”) has primarily been calibrated using dissimilatory, or energy-generating, nitric oxide (NO) reductases (NORs). However, there exists a much larger and phylogenetically widespread class of NO-detoxifying enzymes; in particular, flavohemoglobin proteins (Fhp/Hmp) produce N₂O as a strategy to neutralize damaging NO-radicals in anoxic conditions. This enzyme, which generates N₂O in non-growing and anoxic conditions, may be more relevant to natural environments where N₂O production has been detected. Surprisingly, we found that Fhp imparts a distinct SP on N₂O that differs from both bacterial and eukaryotic NORs, and that this value better aligns with existing in situ measurements of N₂O from soils. In addition, we find that in strains with both Fhp and NOR, the Fhp signal dominates when cells are first exposed to high concentrations of NO in oxic conditions while growing before being shifted to an anoxic, non-growing state. Therefore, in addition to telling us ‘Who’s there,’ the SP fingerprint may also be able to tell us something about cell physiology in vivo. We propose a new framework for interpreting the source of N₂O based on SP values.</p

    Single-cell time-series analysis of metabolic rhythms in yeast

    Get PDF
    The yeast metabolic cycle (YMC) is a biological rhythm in budding yeast (Saccharomyces cerevisiae). It entails oscillations in the concentrations and redox states of intracellular metabolites, oscillations in transcript levels, temporal partitioning of biosynthesis, and, in chemostats, oscillations in oxygen consumption. Most studies on the YMC have been based on chemostat experiments, and it is unclear whether YMCs arise from interactions between cells or are generated independently by each cell. This thesis aims at characterising the YMC in single cells and its response to nutrient and genetic perturbations. Specifically, I use microfluidics to trap and separate yeast cells, then record the time-dependent intensity of flavin autofluorescence, which is a component of the YMC. Single-cell microfluidics produces a large amount of time series data. Noisy and short time series produced from biological experiments restrict the computational tools that are useful for analysis. I developed a method to filter time series, a machine learning model to classify whether time series are oscillatory, and an autocorrelation method to examine the periodicity of time series data. My experimental results show that yeast cells show oscillations in the fluorescence of flavins. Specifically, I show that in high glucose conditions, cells generate flavin oscillations asynchronously within a population, and these flavin oscillations couple with the cell division cycle. I show that cells can individually reset the phase of their flavin oscillations in response to abrupt nutrient changes, independently of the cell division cycle. I also show that deletion strains generate flavin oscillations that exhibit different behaviour from dissolved oxygen oscillations from chemostat conditions. Finally, I use flux balance analysis to address whether proteomic constraints in cellular metabolism mean that temporal partitioning of biosynthesis is advantageous for the yeast cell, and whether such partitioning explains the timing of the metabolic cycle. My results show that under proteomic constraints, it is advantageous for the cell to sequentially synthesise biomass components because doing so shortens the timescale of biomass synthesis. However, the degree of advantage of sequential over parallel biosynthesis is lower when both carbon and nitrogen sources are limiting. This thesis thus confirms autonomous generation of flavin oscillations, and suggests a model in which the YMC responds to nutrient conditions and subsequently entrains the cell division cycle. It also emphasises the possibility that subpopulations in the culture explain chemostat-based observations of the YMC. Furthermore, this thesis paves the way for using computational methods to analyse large datasets of oscillatory time series, which is useful for various fields of study beyond the YMC

    Recombinant spidroins from infinite circRNA translation

    Get PDF
    Spidroins are a diverse family of peptides and the main components of spider silk. They can be used to produce sustainable, lightweight and durable materials for a large variety of medical and engineering applications. Spiders’ territorial behaviour and cannibalism precludes farming them for silk. Recombinant protein synthesis is the most promising way of producing these peptides. However, many approaches have been unsuccessful in obtaining large titres of recombinant spidroins or ones of sufficient molecular weight. The work described here is focused on expressing high molecular weight spidroins from short circular RNA molecules. Mammalian host cells were transfected with designed circular-RNA-producing plasmid vectors. A backsplicing approach was implemented to successfully circularise RNA in a variety of mammalian cell types. This approach could not express any recombinant spidroins based on a variety of qualitative protein assays. Further experiments investigated the reasons behind this. Additionally, due to the diversity of spidroins in a large number of spider lineages, there are potentially many spidroin sequences left to be discovered. A bioinformatic pipeline was developed that accepts transcriptome datasets from RNA sequencing and uses tandem repeat detection and profile HMM annotation to identify novel sequences. This pipeline was specifically designed for the identification of repeat domains in expressed sequences. 21 transcriptomes from 17 different species, encompassing a wide selection of basal and derived spider lineages, were investigated using this pipeline. Six previously undescribed spidroin sequences were discovered. This pipeline was additionally tested in the context of the suckerin protein family. These proteins have recently been investigated for their potential properties in medicine and engineering including adhesion in wet environments. The computational pipeline was able to double the number of suckerins known to date. Further phylogenetic analysis was implemented to expand on the knowledge of suckerins. This pipeline enables the identification of transcripts that may have been overlooked by more mainstream analysis methods such as pairwise homology searches. The spidroins and suckerins discovered by this pipeline may contribute to the large repertoire of potentially useful properties characteristic of this diverse peptide family

    The brown algal genus Fucus : A unique insight into reproduction and the evolution of sex-biased genes

    Get PDF
    Doctoral thesis (PhD) - Nord University, 2023publishedVersio

    Tracing Evolution of Gene Transfer Agents Using Comparative Genomics

    Get PDF
    The accumulating evidence suggest that viruses and their components can be domesticated by their hosts, equipping them with convenient molecular toolkits for various functions. One of such domesticated system is Gene Transfer Agents (GTAs) that are produced by some bacteria and archaea. GTAs morphologically resemble small phage-like particles and contain random fragments of their host genome. They are produced only by a small fraction of the microbial population and are released through a lysis of the host cell. Bioinformatic analyses suggest that GTAs are especially abundant in the taxonomic class of Alphaproteobacteria, where they are vertically inherited and evolve as a part of their host genomes. In this work, we extensively analyze evolutionary patterns of alphaproteobacterial GTAs using comparative genomics, phylogenomics and machine learning methods. We initially develop an algorithm that validate the wide presence of GTA elements in alphaproteobacterial genomes, where they are generally mistaken for prophages due to their homology. Furthermore, we demonstrate that GTAs evolve under the selection that reduces the energetic cost of their production, indicating their importance for the conditions of the nutrient depletion. The genome-wide screenings of translational selection and coevolution signatures highlight the significance of GTAs as a stress-response adaptation for the horizontal gene transfer, revealing a set of previously unknown genes that could play a role in the GTA cycle. As production of GTAs leads to the host death, their maintenance is likely to be under a kin or group level selection. By combining our findings with accumulated body of knowledge, this work proposes a conceptual model illustrating the role of GTAs in bacterial populations and their persistence for hundreds of millions of years of evolution

    Highly contiguous genomes of human clinical isolates of Giardia duodenalis reveal assemblage-and sub-assemblage-specific presence–absence variation in protein-coding genes

    Get PDF
    Giardia duodenalis (syn. G. intestinalis, G. lamblia) is a widespread gastrointestinal protozoan parasite with debated taxonomic status. Currently, eight distinct genetic sub-groups, termed assemblages A–H, are defined based on a few genetic markers. Assemblages A and B may represent distinct species and are both of human public health relevance. Genomic studies are scarce and the few reference genomes available, in particular for assemblage B, are insufficient for adequate comparative genomics. Here, by combining long- and short-read sequences generated by PacBio and Illumina sequencing technologies, we provide nine annotated genome sequences for reference from new clinical isolates (four assemblage A and five assemblage B parasite isolates). Isolates chosen represent the currently accepted classification of sub-assemblages AI, AII, BIII and BIV. Synteny over the whole genome was generally high, but we report chromosome-level translocations as a feature that distinguishes assemblage A from B parasites. Orthologue gene group analysis was used to define gene content differences between assemblage A and B and to contribute a gene-set-based operational definition of respective taxonomic units. Giardia is tetraploid, and high allelic sequence heterogeneity (ASH) for assemblage B vs. assemblage A has been observed so far. Noteworthy, here we report an extremely low ASH (0.002%) for one of the assemblage B isolates (a value even lower than the reference assemblage A isolate WB-C6). This challenges the view of low ASH being a notable feature that distinguishes assemblage A from B parasites, and low ASH allowed assembly of the most contiguous assemblage B genome currently available for reference. In conclusion, the description of nine highly contiguous genome assemblies of new isolates of G. duodenalis assemblage A and B adds to our understanding of the genomics and species population structure of this widespread zoonotic parasite.publishedVersio

    Comparative genomics of recent adaptation in Candida pathogens

    Full text link
    [eng] Fungal infections pose a serious health threat, affecting >1,000 million people and causing ~1.5 million deaths each year. The problem is growing due to insufficient diagnostic and therapeutic options, increased number of susceptible patients, expansion of pathogens partly linked to climate change and the rise of antifungal drug resistance. Among other fungal pathogens, Candida species are a major cause of severe hospital-acquired infections, with high mortality in immunocompromised patients. Various Candida pathogens constitute a public health issue, which require further efforts to develop new drugs, optimize currently available treatments and improve diagnostics. Given the high dynamism of Candida genomes, a promising strategy to improve current therapies and diagnostics is to understand the evolutionary mechanisms of adaptation to antifungal drugs and to the human host. Previous work using in vitro evolution, population genomics, selection inferences and Genome Wide Association Studies (GWAS) have partially clarified such recent adaptation, but various open questions remain. In the three research articles that conform this PhD thesis we addressed some of these gaps from the perspective of comparative genomics. First, we addressed methodological issues regarding the analysis of Candida genomes. Studying recent adaptation in these pathogens requires adequate bioinformatic tools for variant calling, filtering and functional annotation. Among other reasons, current methods are suboptimal due to limited accuracy to identify structural variants from short read sequencing data. In addition, there is a need for easy-to-use, reproducible variant calling pipelines. To address these gaps we developed the “personalized Structural Variation detection” pipeline (perSVade), a framework to call, filter and annotate several variant types, including structural variants, directly from reads. PerSVade enables accurate identification of structural variants in any species of interest, such as Candida pathogens. In addition, our tool automatically predicts the structural variant calling accuracy on simulated genomes, which informs about the reliability of the calling process. Furthermore, perSVade can be used to analyze single nucleotide polymorphisms and copy number-variants, so that it facilitates multi-variant, reproducible genomic studies. This tool will likely boost variant analyses in Candida pathogens and beyond. Second, we addressed open questions about recent adaptation in Candida, using perSVade for variant identification. On the one hand, we investigated the evolutionary mechanisms of drug resistance in Candida glabrata. For this, we used a large-scale in vitro evolution experiment to study adaptation to two commonly-used antifungals: fluconazole and anidulafungin. Our results show rapid adaptation to one or both drugs, with moderate fitness costs and through few mutations in a narrow set of genes. In addition, we characterize a novel role of ERG3 mutations in cross-resistance towards fluconazole in anidulafungin-adapted strains. These findings illuminate the mutational paths leading to drug resistance and cross-resistance in Candida pathogens. On the other hand, we reanalyzed ~2,000 public genomes and phenotypes to understand the signs of recent selection and drug resistance in six major Candida species: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis and C. orthopsilosis. We found hundreds of genes under recent selection, suggesting that clinical adaptation is diverse and complex. These involve species-specific but also convergently affected processes, such as cell adhesion, which could underlie conserved adaptive mechanisms. In addition, using GWAS we predicted known drivers of antifungal resistance alongside potentially novel players. Furthermore, our analyses reveal an important role of generally-overlooked structural variants, and suggest an unexpected involvement of (para)sexual recombination in the spread of resistance. Taken together, our findings provide novel insights on how Candida pathogens adapt to human-related environments and suggest candidate genes that deserve future attention. In summary, the results of this thesis improve our knowledge about the mechanisms of recent adaptation in Candida pathogens, which may enable improved therapeutic and diagnostic applications.[cat] Les infeccions fúngiques representen una greu amenaça per a la salut, afectant a més de 1.000 milions de persones i causant aproximadament 1,5 milions de morts cada any. El problema està augmentant a causa d’unes opcions terapèutiques i diagnòstiques insuficients, l'increment del nombre de pacients susceptibles, l'expansió dels patògens parcialment vinculada al canvi climàtic i l'augment de la resistència als fàrmacs antifúngics. D’entre diversos fongs patògens, els llevats del gènere Candida són una causa important d'infeccions nosocomials, amb una alta mortalitat en pacients immunodeprimits. Diverses espècies de Candida constitueixen un problema de salut pública, cosa que requereix més esforços per a desenvolupar nous medicaments, optimitzar els tractaments disponibles i millorar els diagnòstics. Tenint en compte el dinamisme genòmic d’aquests patògens, una estratègia prometedora per millorar les teràpies i diagnòstics actuals és comprendre els mecanismes evolutius d'adaptació als fàrmacs antifúngics i a l’hoste humà. Treballs anteriors utilitzant l'evolució in vitro, la genòmica de poblacions, les inferències de selecció i els estudis d'associació de genoma complet (GWAS, per les sigles en anglès) han aclarit parcialment aquesta adaptació recent, però encara hi ha diverses preguntes obertes. En els tres articles que conformen aquesta tesi doctoral, hem abordat algunes d'aquestes preguntes des de la perspectiva de la genòmica comparativa. En primer lloc, hem abordat qüestions metodològiques relatives a l'anàlisi dels genomes de les espècies Candida. L'estudi de l'adaptació recent en aquests patògens requereix eines bioinformàtiques adequades per a la detecció, filtratge i anotació funcional de variants genètiques. Entre altres raons, els mètodes actuals són subòptims a causa de la limitada precisió per identificar variants estructurals a partir de dades de seqüenciació amb lectures curtes. A més, hi ha una necessitat d’eines computacionals per a la detecció de variants que siguin senzilles d'utilitzar i reproduibles. Per abordar aquestes mancances, hem desenvolupat el mètode bioinformàtic "personalized Structural Variation detection" (perSVade), una eina que permet la detecció, filtratge i anotació de diversos tipus de variants, incloent-hi les variants estructurals, directament des de les lectures. PerSVade permet la identificació precisa de les variants estructurals en qualsevol espècie d'interès, com ara els patògens Candida. A més, la nostra eina prediu automàticament la precisió de la detecció d’aquestes variants en genomes simulats, la qual cosa informa sobre la fiabilitat del procés. Finalment, perSVade es pot utilitzar per analitzar altres tipus de variants, com els polimorfismes de nucleòtid únic o els canvis en el nombre de còpies, facilitant així estudis genòmics integrals i reproduibles. Aquesta eina probablement impulsarà les anàlisis genòmiques en els patògens Candida i també en altres espècies. En segon lloc, hem abordat algunes de les preguntes obertes sobre l'adaptació recent en els llevats Candida, utilitzant perSVade per a la identificació de variants. D'una banda, hem investigat els mecanismes evolutius de resistència als fàrmacs antifúngics en Candida glabrata. Per a això, hem utilitzat un experiment d'evolució in vitro a gran escala per estudiar l'adaptació a dos antifúngics comuns: el fluconazol i l’anidulafungina. Els nostres resultats mostren una adaptació ràpida a un o ambdós fàrmacs, amb un cost per al creixement moderat i a través de poques mutacions en un nombre reduït de gens. A més, hem caracteritzat un paper nou de les mutacions en ERG3 en la resistència creuada al fluconazol en soques adaptades a anidulafungina. Aquests descobriments aclareixen els processos mutacionals que condueixen a la resistència als fàrmacs i a la resistència creuada en els patògens Candida. D'altra banda, hem re-analitzat aproximadament 2.000 genomes i fenotips disponibles en repositoris públics per a comprendre els senyals genòmics de selecció recent i de resistència a fàrmacs antifúngics, en sis espècies rellevants de Candida: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis i C. orthopsilosis. Hem trobat centenars de gens sota selecció recent, suggerint que l'adaptació clínica és diversa i complexa. Aquests gens estan relacionats amb funcions específiques de cada espècie, però també trobem processos alterats de manera similar en diferents patògens, com per exemple l’adhesió cel·lular, cosa que indica fenòmens d’adaptació conservats. A part, utilitzant GWAS hem predit mecanismes esperats de resistència a antifúngics i també possibles nous factors. A més, les nostres anàlisis revelen un paper important de les variants estructurals, generalment poc estudiades, i suggereixen una implicació inesperada de la recombinació (para)sexual en la propagació de la resistència. En conjunt, els nostres descobriments proporcionen noves perspectives sobre com els patògens Candida s'adapten als entorns humans, i suggereixen gens candidats que mereixen investigacions futures. En resum, els resultats d’aquesta tesi milloren el nostre coneixement sobre els mecanismes d'adaptació recent en els patògens Candida, cosa que pot permetre el disseny de noves teràpies i diagnòstics
    corecore