505 research outputs found

    High throughput prediction of inter-protein coevolution

    Get PDF
    Inter-protein co-evolution analysis can reveal in/direct functional or physical protein interactions. Inter-protein co-evolutionary analysis compares the correlation of evolutionary changes between residues on aligned orthologous sequences. On the other hand, modern methods used in experimental cell biological research to screen for protein-protein interaction, often based on mass spectrometry, often lead to identification of large amount of possible interacting proteins. If automatized, inter-protein co-evolution analysis can serve as a valuable step in refining the results, typically containing hundreds of hits, for further experiments. Manual retrieval of tens of orthologous sequences, alignment and phylogenetic tree preparations of such amounts of data is insufficient. The aim of this thesis is to create an assembly of scripts that automatize high-throughput inter-protein co-evolution analysis. Scripts were written in Python language. Scripts are using API client interface to access online databases with sequences of input protein identifiers. Through matched identifiers, over 85 representative orthologous sequences from vertebrate species are retrieved from OrthoDB orthologues database. Scripts align these sequences with PRANK MSA algorithm and create corresponding phylogenetic tree. All protein pairs are structured for multicore computation with CAPS programme on CSC supercomputer. Multiple CAPS outputs are abstracted into comprehensive form for comparison of relative co-adaptive co-evolution between proposed protein pairs. In this work, I have developed automatization for a protein-interactome screen done by proximity labelling of B cell receptor and plasma membrane associated proteins under activating or non-activating conditions. Applying high-throughput co-evolutionary analysis to this data provides a completely new approach to identify new players in B cell activation, critical for autoimmunity, hypo-immunity or cancer. Results showed unsatisfying performance of CAPS, explanation and alternatives were given

    Mapping the proteome with data-driven methods: A cycle of measurement, modeling, hypothesis generation, and engineering

    Get PDF
    The living cell exhibits emergence of complex behavior and its modeling requires a systemic, integrative approach if we are to thoroughly understand and harness it. The work in this thesis has had the more narrow aim of quantitatively characterizing and mapping the proteome using data-driven methods, as proteins perform most functional and structural roles within the cell. Covered are the different parts of the cycle from improving quantification methods, to deriving protein features relying on their primary structure, predicting the protein content solely from sequence data, and, finally, to developing theoretical protein engineering tools, leading back to experiment.\ua0\ua0\ua0\ua0 High-throughput mass spectrometry platforms provide detailed snapshots of a cell\u27s protein content, which can be mined towards understanding how the phenotype arises from genotype and the interplay between the various properties of the constituent proteins. However, these large and dense data present an increased analysis challenge and current methods capture only a small fraction of signal. The first part of my work has involved tackling these issues with the implementation of a GPU-accelerated and distributed signal decomposition pipeline, making factorization of large proteomics scans feasible and efficient. The pipeline yields individual analyte signals spanning the majority of acquired signal, enabling high precision quantification and further analytical tasks.\ua0\ua0\ua0 Having such detailed snapshots of the proteome enables a multitude of undertakings. One application has been to use a deep neural network model to learn the amino acid sequence determinants of temperature adaptation, in the form of reusable deep model features. More generally, systemic quantities may be predicted from the information encoded in sequence by evolutionary pressure. Two studies taking inspiration from natural language processing have sought to learn the grammars behind the languages of expression, in one case predicting mRNA levels from DNA sequence, and in the other protein abundance from amino acid sequence. These two models helped build a quantitative understanding of the central dogma and, furthermore, in combination yielded an improved predictor of protein amount. Finally, a mathematical framework relying on the embedded space of a deep model has been constructed to assist guided mutation of proteins towards optimizing their abundance

    High-throughput dereplication and identification of bacterial and yeast communities involved in lambic beer fermentation processes

    Get PDF

    Confident metabolite structure annotation with COSMIC

    Get PDF
    Small molecules are key to biomarker discovery, drug development, toxicity screenings of ecosystems like rivers and lakes, and many more important research areas in multiple life sciences. Elucidating the exact structure of these metabolites is often crucial in determining their functionality, however, confident annotation of these structures remains a major challenge. To analyse samples of small molecules occurring in nature, mass spectrometry is the currently predominant technique. While mass spectrometry is used to measure the mass of a compound, tandem mass spectrometry can be used to additionally measure the mass of its fragments. The resulting spectral data however is highly non-trivial to interpret. This bottleneck accelerates the development of computational tools to annotate metabolite structures from mass spectrometry data, which enables rapid, large-scale structure annotation independent from spectral libraries. These tools return some proportion of incorrect annotations, which can vastly outnumber correct annotations. Scientists using these tools need to be able to differentiate correct from incorrect annotations. We develop an E-value computation that is based on proxy decoys drawn from the PubChem database and show that this E-value score outperforms the current CSI:FingerID hit score for the task of separating correct from incorrect annotations. To further improve on this, we develop a Percolator inspired machine learning approach, where we train linear support vector machines for this separation task. The confidence score outperforms the original CSI:FingerID hit score, the E-value score and all other tools that participated in the CASMI 2016 contest by a wide margin. Arguably, our confidence score enables confident structure annotation for a relevant portion of a dataset for the first time. We then show the power of this COSMIC workflow by annotating novel bile acid conjugate structures never reported before in a mouse fecal dataset

    Fourth Symposium on Chemical Evolution and the Origin and Evolution of Life

    Get PDF
    This symposium was held at the NASA Ames Research Center, Moffett Field, California, July 24-27, 1990. The NASA exobiology investigators reported their recent research findings. Scientific papers were presented in the following areas: cosmic evolution of biogenic compounds, prebiotic evolution (planetary and molecular), early evolution of life (biological and geochemical), evolution of advanced life, solar system exploration, and the Search for Extraterrestrial Intelligence (SETI)

    The gut microbiota of bumblebees : a treasure chest of biodiversity and functionality

    Get PDF

    Linking genomics and metabolomics to chart specialized metabolic diversity

    Get PDF
    Microbial and plant specialized metabolites constitute an immense chemical diversity, and play key roles in mediating ecological interactions between organisms. Also referred to as natural products, they have been widely applied in medicine, agriculture, cosmetic and food industries. Traditionally, the main discovery strategies have centered around the use of activity-guided fractionation of metabolite extracts. Increasingly, omics data is being used to complement this, as it has the potential to reduce rediscovery rates, guide experimental work towards the most promising metabolites, and identify enzymatic pathways that enable their biosynthetic production. In recent years, genomic and metabolomic analyses of specialized metabolic diversity have been scaled up to study thousands of samples simultaneously. Here, we survey data analysis technologies that facilitate the effective exploration of large genomic and metabolomic datasets, and discuss various emerging strategies to integrate these two types of omics data in order to further accelerate discovery

    Identifizierung eisenreduzierender Mikroorganismen in anoxischem Reisfeldboden mit Hilfe stabiler Isotope

    Get PDF
    In den letzten Jahrzehnten erlangte die Eisenreduktion als ein wichtiger mikrobieller Prozess und die daran beteiligten Mikroorganismen zunehmend an Bedeutung. In vielen anoxischen Habitaten ist Fe(III) der am häufigsten vorkommende natürliche Elektronenakzeptor. So auch in geflutetem Reisfeldboden, wo die dissimilatorische Eisenreduktion neben der Methanogenese den wichtigsten anaeroben respiratorischen Prozess darstellt. Über die Diversität der mikrobiellen Populationen, die ihre Energiegewinnung an die Reduktion von Fe(III) koppeln, ist bisher wenig bekannt. In dieser Arbeit wurde erstmals mittels biogeochemischer und molekularbiologischer Methoden die mikrobielle Eisenreduktion in geflutetem Reisfeldboden in Anwesenheit von verschiedenen Eisen(III)-Oxiden (Ferrihydrit, Lepidokrokit, Goethit und Hämatit) und unterschiedlichen Kohlenstoffquellen (13C2-Acetat und 13C3-Lactat) untersucht. Die Anwendung der Stabilen Isotopenbeprobung ermöglichte dabei die Verknüpfung von Funktion und phylogenetischer Einordnung der metabolisch aktiven Mikroorganismen und erlaubte die Identifizierung von bekannten und neuartigen eisenreduzierenden Mikroorganismen als funktionelle Gilde im Reisfeldboden ohne ihre vorherige Kultivierung. Die Zugabe von Acetat als ein zentrales Intermediat im anaeroben Stoffwechsel führte unter eisenreduzierenden Bedingungen zur Detektion von bakteriellen Populationen, die bekannten Geobacter und Anaeromyxobacter spp. (delta-Proteobacteria) zugeordnet werden konnten. In Anwesenheit von Goethit zeigten diese beiden Cluster sogar eine vergleichbare Abundanz, was auf die Ausbildung von unterschiedlichen ökologischen Nischen innerhalb des Ökosystems hinweist. Die detektierten neuartigen Bakterien-Cluster sind phylogenetisch in den Familien der Clostridiaceae und Paenibacillaceae (Firmicutes) angesiedelt, wurden bislang jedoch nicht mit der dissimilatorischen Eisenreduktion in Verbindung gebracht. Ob diese Populationen Lactat oder die daraus resultierenden Abbauprodukte Acetat und Propionat metabolisieren, ist anhand der Ergebnisse nicht ersichtlich. Auch in den mitgeführten Kontrollen (ohne Zugabe von Eisen[III]-Oxiden) wurde der Einbau des stabilen Kohlenstoffisotops (13C) nachgewiesen. Die Analyse der 16S rRNA zeigte, dass es zu einer starken Aktivierung von Vertretern der Rhodocyclaceae (beta-Proteobacteria) kam. Die beobachtete Aktivität dieser Population in den Ansätzen mit Goethit und Hämatit stützt die in dieser Arbeit aufgestellte Hypothese, dass diese bisher nicht kultivierten Rhodocyclaceae an der Reduktion von Eisen(III)-Oxiden mit geringer Bioverfügbarkeit beteiligt sein könnten. Neben der Identifizierung der dissimilatorisch eisenreduzierenden Mikroorganismen unter diversen physiologischen Bedingungen, wurde der Einfluss charakteristischer Eigenschaften (z.B. Kristallinität, Oberflächen- und Partikelgröße) der Eisen(III)-Oxide auf die mikrobielle Reduzierbarkeit untersucht. Die Zugabe der verschiedenen Eisen(III)-Oxide resultierte in einem selektiven Effekt auf die mikrobielle Gemeinschaft der Eisenreduzierer in geflutetem Reisfeldboden. Es wurden nicht nur Eisen(III)-Oxid-spezifische und phylogenetisch diverse Populationen angereichert, sondern es zeigten sogar einige Spezies desselben Genus (Geobacter) eine Spezialisierung auf bestimmte Eisen(III)-Oxide mit unterschiedlichem Redoxpotential. Diese Beobachtung weist auf eine hohe funktionelle Diversität innerhalb dieser phylogenetisch, auf Ebene der 16S rRNA-Gene sehr ähnlichen Gruppe von Mikroorganismen hin und bietet somit einen interessanten Ansatzpunkt für vergleichende Analysen auf Genombasis. Mit Hilfe der vorliegenden Arbeit konnte gezeigt werden, dass die Diversität eisenreduzierender Mikroorganismen im natürlichen Habitat weitaus größer ist als bisher angenommen. Neben dem zur Verfügung stehenden Elektronendonor hat die Eisen(III)-Oxid-Phase einen maßgeblichen Einfluss auf die metabolisch aktive mikrobielle Gemeinschaft. Die Identifizierung der eisenreduzierenden Mikroorganismen als funktionelle Gilde leistet außerdem einen wesentlichen Beitrag zum Verständnis der Eisenreduktion und der daran gekoppelten Prozesse im Modellsystem Reisfeldboden
    corecore