18 research outputs found

    cazy_webscraper : for creating a local CAZy database

    Get PDF
    Carbohydrate Active enZymes (CAZymes) are pivotal in pathogen recognition, signalling, structure and energy metabolism. CAZy (www.cazy.org) is the most comprehensive CAZyme database, but it does not provide methods for automating data retrieval or submitting sequences for annotation. cazy_webscraper retrieves user-specified datasets from CAZy, producing a local SQL database enabling thorough interrogation of the data. cazy_webscraper can also retrieve protein sequences from GenBank and download structure files from RCSB PDB.Publisher PDFNon peer reviewe

    cazy_webscraper : For creating a local CAZy database

    Get PDF
    Carbohydrate Active enZymes (CAZymes) are pivotal in pathogen recognition, signalling, structure and energy metabolism. CAZy (www.cazy.org) is the most comprehensive CAZyme database, but it does not provide methods for automating data retrieval or submitting sequences for annotation. cazy_webscraper retrieves user-specified datasets from CAZy, producing a local SQL database enabling thorough interrogation of the data. cazy_webscraper can also retrieve protein sequences from GenBank and download structure files from RCSB PDB

    Genome-level analyses resolve an ancient lineage of symbiotic ascomycetes

    Get PDF
    Ascomycota account for about two-thirds of named fungal species.1 Over 98% of known Ascomycota belong to the Pezizomycotina, including many economically important species as well as diverse pathogens, decomposers, and mutualistic symbionts.2 Our understanding of Pezizomycotina evolution has until now been based on sampling traditionally well-defined taxonomic classes.3,4,5 However, considerable diversity exists in undersampled and uncultured, putatively early-diverging lineages, and the effect of these on evolutionary models has seldom been tested. We obtained genomes from 30 putative early-diverging lineages not included in recent phylogenomic analyses and analyzed these together with 451 genomes covering all available ascomycete genera. We show that 22 of these lineages, collectively representing over 600 species, trace back to a single origin that diverged from the common ancestor of Eurotiomycetes and Lecanoromycetes over 300 million years BP. The new clade, which we recognize as a more broadly defined Lichinomycetes, includes lichen and insect symbionts, endophytes, and putative mycorrhizae and encompasses a range of morphologies so disparate that they have recently been placed in six different taxonomic classes. To test for shared hidden features within this group, we analyzed genome content and compared gene repertoires to related groups in Ascomycota. Regardless of their lifestyle, Lichinomycetes have smaller genomes than most filamentous Ascomycota, with reduced arsenals of carbohydrate-degrading enzymes and secondary metabolite gene clusters. Our expanded genome sample resolves the relationships of numerous “orphan” ascomycetes and establishes the independent evolutionary origins of multiple mutualistic lifestyles within a single, morphologically hyperdiverse clade of fungi

    Polysaccharide utilization loci and associated genes in marine Bacteroidetes - compositional diversity and ecological relevance

    Get PDF
    The synthesis of marine organic carbon compounds by photosynthetic macroalgae, microalgae (phytoplankton) and bacteria provide a basis for life in the ocean. In marine surface waters this primary production is largely dominated by microalgae and is especially pronounced during spring phytoplankton blooms. During and after these often diatom-dominated blooms, increased amounts of organic matter are released into the surrounding waters. Here, the organic matter, rich in polysaccharides, can trigger blooms of heterotrophic bacteria. Marine members of the Bacteroidetes are consistently found related to such bloom events. These bacteria are regularly detected as the first responders to thrive after phytoplankton spring blooms in temperate coastal regions and are often equipped with a variety of polysaccharide utilization gene clusters. These gene clusters, termed polysaccharide utilization loci (PULs), encode enzymes for the extracellular hydrolysis of polysaccharides and the subsequent uptake of oligosaccharides into the periplasm, where they are shielded from competing bacteria. This mechanism allows for rapid uptake and substrate hoarding, and thus could be one reason why Bacteroidetes are often seen as the first responders of the bacterioplankton community. The investigation of the so far largely unknown diversity and the ecological relevance of PULs in marine Bacteroidetes was the major goal of the work presented here. We could show that genomes of Bacteroidetes isolates from the North Sea, with free-living to micro- and macro-algae associated lifestyles, harboured a variety of these loci predicted to target in total 18 different substrate classes. Overall PUL repertoires of these isolates showed considerable intra-genus and inter-genus, variations suggesting that Bacteroidetes species harbour distinct glycan niches, independent of their phylogenetic relationships. By investigating the PUL repertoires of uncultured free-living Bacteroidetes during three consecutive years of spring phytoplankton blooms at the North Sea island of Helgoland, I could further reveal that the set of targeted substrates during these bloom events was dominated by only five of the substrate classes targeted by the isolates. These were the diatom storage polysaccharide laminarin, alpha-glucans, alginates, as well as substrates rich in alpha-mannans and sulfated xylans. In addition to this constrained set of substrate classes targeted by the free-living Bacteroidetes community, I could show that the species diversity during these blooms was limited and dominated by only 27 abundant and recurrent species that carried a limited number of abundant PULs. The majority of these PULs were targeting laminarin and alpha-glucan substrates, which were likely targeted during the entire time of the blooms. The less frequent PULs, targeting alpha-mannans and sulfated xylans, were predominantly detected during mid- and late- bloom phases, suggesting a relevance of these two substrate classes in the later phases of phytoplankton blooms. Overall these findings highlight the recurrence of a few specialized Bacteroidetes species and the environmental relevance of specific polysaccharide substrate classes during spring phytoplankton blooms. However, for some of these substrate classes the origin, structural details and their abundance during blooms are as yet largely unknown. To further shed light on the polysaccharide niches of abundant key-players, these findings can serve as a guide for future laboratory studies

    Human-microbiota interactions in health and disease :bioinformatics analyses of gut microbiome datasets

    Get PDF
    EngD ThesisThe human gut harbours a vast diversity of microbial cells, collectively known as the gut microbiota, that are crucial for human health and dysfunctional in many of the most prevalent chronic diseases. Until recently culture dependent methods limited our ability to study the microbiota in depth including the collective genomes of the microbiota, the microbiome. Advances in culture independent metagenomic sequencing technologies have since provided new insights into the microbiome and lead to a rapid expansion of data rich resources for microbiome research. These high throughput sequencing methods and large datasets provide new opportunities for research with an emphasis on bioinformatics analyses and a novel field for drug discovery through data mining. In this thesis I explore a range of metagenomics analyses to extract insights from metagenomics data and inform drug discovery in the microbiota. Firstly I survey the existing technologies and data sources available for data mining therapeutic targets. Then I analyse 16S metagenomics data combined with metabolite data from mice to investigate the treatment model of a proposed antibiotic treatment targetting the microbiota. Then I investigate the occurence frequency and diversity of proteases in metagenomics data in order to inform understanding of host-microbiota-diet interactions through protein and peptide associated glycan degradation by the gut microbiota. Finally I develop a system to facilitate the process of integrating metagenomics data for gene annotations. One of the main challenges in leveraging the scale of data availability in microbiome research is managing the data resources from microbiome studies. Through a series of analytical studies I used metagenomics data to identify community trends, to demonstrate therapeutic interventions and to do a wide scale screen for proteases that are central to human-microbiota interactions. These studies articulated the requirement for a computational framework to integrate and access metagenomics data in a reproducible way using a scalable data store. The thesis concludes explaining how data integration in microbiome research is needed to provide the insights into metagenomics data that are required for drug discovery

    Needles in a haystack of protein diversity: Interrogation of complex biological samples through specialized strategies in bottom-up proteomics uncover peptides of interest for diverse applications

    Get PDF
    Peptide identification is at the core of bottom-up proteomics measurements. However, even with state-of the-art mass spectrometric instrumentation, peptide level information is still lost or missing in these types of experiments. Reasons behind missing peptide identifications in bottom-up proteomics include variable peptide ionization efficiencies, ion suppression effects, as well as the occurrence of chimeric spectra that can lower the efficacy of database search strategies. Peptides derived from naturally abundant proteins in a biological system also have better chances of being identified in comparison to the ones produced from less abundant proteins, at least in regular discovery-based proteomics experiments. This dissertation focused on the recovery of the “missing or hidden proteome” information in complex biological matrices by approaching this challenge under a peptide-centric view and implementing different liquid chromatography tandem mass spectrometry (LC-MS/MS) experimental workflows. In particular, the projects presented here covered: (1) The feasibility of applying a liquid chromatography-multiple reaction monitoring MS methodology for the targeted identification of peptides serving as surrogates of protein biomarkers in environmental matrices with unknown microbial diversities; (2) the evaluation of selecting unique tryptic peptides in-silico that can distinguish groups of proteins, instead of individual proteins, for targeted proteomics workflows; (3) maximizing peptide identification in spectral data collected from different LC-MS/MS setups by applying a multi-peptide-spectrum-match algorithm, and (4) showing that LC-MS/MS combined with de novo assisted-database searches is a feasible strategy for the comprehensive identification of peptides derived from native proteolytic mechanisms in biological systems

    Investigating The Grey Field Slug

    Get PDF
    High-throughput sequencing was used to analyse cDNA generated from tissues of the grey field slug, Deroceras reticulatum, a significant invertebrate pest of agricultural and horticultural crops. Almost no sequence data is available for this organism. In this project, we performed de novo transcriptome sequencing to produce sequence dataset for the Deroceras reticulatum. A total of 132,597 and 161,419 sequencing reads between 50-600bp from the digestive gland and neural tissue were obtained through Roche 454 pyrosequencing. These reads were assembled into contiguous sequences and annotated using sequence homology search tools. Multiple sequence assemblies and annotation data was amalgamated into a biological database using BioSQL. Analysis of the dataset with predictions of probable protein function were made based on annotation data. InterPro (IPR) terms generated with InterProScan software were mapped to read counts and used to identify more frequently sequenced gene families. Digestive hydrolases were major transcripts in the digestive gland, with cysteine proteinases and cellulases being the most abundant functional classes. A Cathepsin L homologue is likely to be responsible for the proteinase activity of the digestive gland which was previously detected by biochemical analysis. Cathepsin L and several other predicted proteins were used to design RNAi experiments to assess potential for crop pest defence strategy. Further work on protein expression of a native tumour necrosis factor (TNF) ligand homologue was also conducted as an exemplar study

    Proteome characterizations of microbial systems using MS-based experimental and informatics approaches to examine key metabolic pathways, proteins of unknown function, and phenotypic adaptation

    Get PDF
    Microbes express complex phenotypes and coordinate activities to build microbial communities. Recent work has focused on understanding the ability of microbial systems to efficiently utilize cellulosic biomass to produce bioenergy-related products. In order to maximize the yield of these bioenergy-related products from a microbial system, it is necessary to understand the molecular mechanisms.The ability of mass spectrometry to precisely identify thousands of proteins from a bacterial source has established mass spectrometry-based proteomics as an indispensable tool for various biological disciplines. This dissertation developed and optimized various proteomics experimental and informatic protocols, and integrated the resulting data with metabolomics, transcriptomics, and genomics in order to understand the systems biology of bio-energy relevant organisms. Integration of these various omics technologies led to an improved understanding of microbial cell-to-cell communication in response to external stimuli, microbial adaptation during deconstruction of lignocellulosic biomass and proteome diversity when an organism is subjected to different growth conditions.Integrated omics revealed Clostridium thermocellum\u27s accumulate long-chain, branched fatty acids over time in response to cytotoxic inhibitors released during the deconstruction and utilization of switchgrass. A striking feature implies a restructuring of C. thermocellum\u27s cellular membrane as the culture progresses. The membrane remodulation was further examined in a study involving the swarming and swimming phenotypes of Paenibacillus polymyxa. The possible roles of phospholipids, hydrolytic enzymes, surfactin, flagellar assembly, chemotaxis and glycerol metabolism in swarming motility were investigated by integrating lipidomics with proteomics.Extracellular proteome analysis of Caldicellulosiruptor bescii revealed secretome plasticity based on the complexity (mono-/disaccharides vs. polysaccharides) and type of carbon (C5 vs. C6) available to the microorganism. This study further opened the avenue for research to characterize proteins of unknown function (PUFs) specific to growth conditions.To gain a better understanding of the possible functions of PUFs in C. thermocellum, a time course analysis of C. thermocellum was conducted. Based on the concept of guilt-by-association, protein intensities and their co-expressions were used to tease out the functional aspect of PUFs. Clustering trends and network analysis were used to infer potential functions of PUFs. Selected PUFs were further interrogated by the use of phylogeny and structural modeling

    Comparative genomic analysis of pleurotus species reveals insights into the evolution and coniferous utilization of Pleurotus placentodes

    Get PDF
    Pleurotus placentodes (PPL) and Pleurotus cystidiosus (PCY) are economically valuable species. PPL grows on conifers, while PCY grows on broad-leaved trees. To reveal the genetic mechanism behind PPL’s adaptability to conifers, we performed de novo genome sequencing and comparative analysis of PPL and PCY. We determined the size of the genomes for PPL and PCY to be 36.12 and 42.74 Mb, respectively, and found that they contain 10,851 and 15,673 protein-coding genes, accounting for 59.34% and 53.70% of their respective genome sizes. Evolution analysis showed PPL was closely related to P. ostreatus with the divergence time of 62.7 MYA, while PCY was distantly related to other Pleurotus species with the divergence time of 111.7 MYA. Comparative analysis of carbohydrate-active enzymes (CAZYmes) in PPL and PCY showed that the increase number of CAZYmes related to pectin and cellulose degradation (e.g., AA9, PL1) in PPL may be important for the degradation and colonization of conifers. In addition, geraniol degradation and peroxisome pathways identified by comparative genomes should be another factors for PPL’s tolerance to conifer substrate. Our research provides valuable genomes for Pleurotus species and sheds light on the genetic mechanism of PPL’s conifer adaptability, which could aid in breeding new Pleurotus varieties for coniferous utilization

    Beiträge zur Glykobioinformatik Entwicklung von Software-Werkzeugen für die Glykobiologie

    Get PDF
    Die vorliegende Arbeit umfasst die Entwicklung von Algorithmen und Strategien zur Analyse von Massenspektren von Glykanen und Kohlenhydrate sowie Strategien zur voll- und halbautomatischen Aktualisierung und Annotierung einer bestehenden Datenbank der Sweet-DB. Für die Glykomik fehlte es bisher an Algorithmen, die ähnlich wie im Bereich der Proteomik bei der Sequenzierung von Peptiden, dem Benutzer eine Hilfe bei der Analyse von N-, O-Glykanen und Lipopolysacchariden sind. Die Zusammensetzung dieser Verbindungen ist aber für das Verständnis der zellulären Stoffwechsel-physiologie von essentieller Bedeutung. Im Rahmen der Entwicklung von Algorithmen zur Aufklärung von Massenspektren wurden insgesamt drei Programme entwickelt, die es dem Forscher gestatten, eine große Anzahl von Spektren, die im Bereich der Proteomik und Glykomik anfallen, auszuwerten. Dabei entstanden die Programme findYSeries, Glyco-Fragment und peakAssign, die eine schnellere Auswertung von Massenspektren im Bereich der Glykobiologie gestatten. So kann mit diesen Programmen die glykosylierte Aminosäure, die Komposition, die Anzahl der Antennen eines Glykans oder sogar die Sequenz eines Kohlenhydrats ermittelt werden. Im selben Maße wie der Bedarf an Programmen zur Auswertung von Messdaten zunimmt, steigt auch die Menge der daraus gewonnenen Erkenntnisse und Informationen. Diese Daten müssen dem Benutzer in entsprechenden Datenbanken zur Verfügung gestellt werden. In der Vergangenheit hat es sich leider gezeigt, dass dieser Prozess durch die damit verbundenen Kosten zum Ende eines Projektes führen kann. In dieser Arbeit sind verschiedene Strategien dargestellt worden, die zum Teil eine automatische Annotierung der Daten gestatten. Bei der Umsetzung sind zwei Erweiterungen der Sweet-Db entstanden. Die Algorithmen der Programme Glyco-Search-Ms und Glycan-Profiling gestatten eine schnelle Suche in einer theoretischen Vergleichsspektren-Sammlung. Bei der Verwaltung des Datenbestandes sind in erster Linie die Arbeitsumgebung zur Verwaltung von NMR- und Massenspektren zu nennen. Es wurde eine dezentrale Lösung geschaffen, die es dem Benutzer ermöglicht seine lokal gemessenen Spektren in dieser Datenbank zu verwalten. Hat er seine Ergebnisse veröffentlicht, können die Spektren über die beschriebenen Schnittstellen sofort in der Sweet-Db veröffentlicht werden. Dieses Vorgehen hat den Vorteil, dass die Daten ohne erneute Eingabe in die Datenbank übernommen werden können. In einem ersten Test wurden von zwei Hilfskräften ohne größere Probleme 347 Spektren über die Arbeitsumgebung eingeben und stehen nun der Sweet-Db zur Verfügung. Mit Hilfe der Programme autoReference und Reference konnte die Aktualisierung der Literatur zumindest semiautomatisch erfolgen. Ausgehend von einer Liste mit Trivialnamen kann in regelmäßigen Abständen in der Pubmed gesucht werden. Diese Rohdaten werden in einer temporären Datenbank zwischengespeichert und werden nach einer Kontrolle durch einen Experten in die Sweet-Db eingetragen
    corecore