1,030 research outputs found

    Computational prediction of protein aggregation : advances in proteomics, conformation-specific algorithms and biotechnological applications

    Get PDF
    Protein aggregation is a widespread phenomenon that stems from the establishment of non-native intermolecular contacts resulting in protein precipitation. Despite its deleterious impact on fitness, protein aggregation is a generic property of polypeptide chains, indissociable from protein structure and function. Protein aggregation is behind the onset of neurodegenerative disorders and one of the serious obstacles in the production of protein-based therapeutics. The development of computational tools opened a new avenue to rationalize this phenomenon, enabling prediction of the aggregation propensity of individual proteins as well as proteome-wide analysis. These studies spotted aggregation as a major force driving protein evolution. Actual algorithms work on both protein sequences and structures, some of them accounting also for conformational fluctuations around the native state and the protein microenvironment. This toolbox allows to delineate conformation-specific routines to assist in the identification of aggregation-prone regions and to guide the optimization of more soluble and stable biotherapeutics. Here we review how the advent of predictive tools has change the way we think and address protein aggregation

    Deriving Protein Structures Efficiently by Integrating Experimental Data into Biomolecular Simulations

    Get PDF
    Proteine sind molekulare Nanomaschinen in biologischen Zellen. Sie sind wesentliche Bausteine aller bekannten Lebensformen, von Einzellern bis hin zu Menschen, und erfüllen vielfältige Funktionen, wie beispielsweise den Sauerstofftransport im Blut oder als Bestandteil von Haaren. Störungen ihrer physiologischen Funktion können jedoch schwere degenerative Krankheiten wie Alzheimer und Parkinson verursachen. Die Entwicklung wirksamer Therapien für solche Proteinfehlfaltungserkrankungen erfordert ein tiefgreifendes Verständnis der molekularen Struktur und Dynamik von Proteinen. Da Proteine aufgrund ihrer lichtmikroskopisch nicht mehr auflösbaren Größe nur indirekt beobachtet werden können, sind experimentelle Strukturdaten meist uneindeutig. Dieses Problem lässt sich in silico mittels physikalischer Modellierung biomolekularer Dynamik lösen. In diesem Feld haben sich datengestützte Molekulardynamiksimulationen als neues Paradigma für das Zusammenfügen der einzelnen Datenbausteine zu einem schlüssigen Gesamtbild der enkodierten Proteinstruktur etabliert. Die Strukturdaten werden dabei als integraler Bestandteil in ein physikbasiertes Modell eingebunden. In dieser Arbeit untersuche ich, wie sogenannte strukturbasierte Modelle verwendet werden können, um mehrdeutige Strukturdaten zu komplementieren und die enthaltenen Informationen zu extrahieren. Diese Modelle liefern eine effiziente Beschreibung der aus der evolutionär optimierten nativen Struktur eines Proteins resultierenden Dynamik. Mithilfe meiner systematischen Simulationsmethode XSBM können biologische Kleinwinkelröntgenstreudaten mit möglichst geringem Rechenaufwand als physikalische Proteinstrukturen interpretiert werden. Die Funktionalität solcher datengestützten Methoden hängt stark von den verwendeten Simulationsparametern ab. Eine große Herausforderung besteht darin, experimentelle Informationen und theoretisches Wissen in geeigneter Weise relativ zueinander zu gewichten. In dieser Arbeit zeige ich, wie die entsprechenden Simulationsparameterräume mit Computational-Intelligence-Verfahren effizient erkundet und funktionale Parameter ausgewählt werden können, um die Leistungsfähigkeit komplexer physikbasierter Simulationstechniken zu optimieren. Ich präsentiere FLAPS, eine datengetriebene metaheuristische Optimierungsmethode zur vollautomatischen, reproduzierbaren Parametersuche für biomolekulare Simulationen. FLAPS ist ein adaptiver partikelschwarmbasierter Algorithmus inspiriert vom Verhalten natürlicher Vogel- und Fischschwärme, der das Problem der relativen Gewichtung verschiedener Kriterien in der multivariaten Optimierung generell lösen kann. Neben massiven Fortschritten in der Verwendung von künstlichen Intelligenzen zur Proteinstrukturvorhersage ermöglichen leistungsoptimierte datengestützte Simulationen detaillierte Einblicke in die komplexe Beziehung von biomolekularer Struktur, Dynamik und Funktion. Solche computergestützten Methoden können Zusammenhänge zwischen den einzelnen Puzzleteilen experimenteller Strukturinformationen herstellen und so unser Verständnis von Proteinen als den Grundbausteinen des Lebens vertiefen

    Biological Systems Workbook: Data modelling and simulations at molecular level

    Get PDF
    Nowadays, there are huge quantities of data surrounding the different fields of biology derived from experiments and theoretical simulations, where results are often stored in biological databases that are growing at a vertiginous rate every year. Therefore, there is an increasing research interest in the application of mathematical and physical models able to produce reliable predictions and explanations to understand and rationalize that information. All these investigations are helping to overcome biological questions pushing forward in the solution of problems faced by our society. In this Biological Systems Workbook, we aim to introduce the basic pieces allowing life to take place, from the 3D structural point of view. We will start learning how to look at the 3D structure of molecules from studying small organic molecules used as drugs. Meanwhile, we will learn some methods that help us to generate models of these structures. Then we will move to more complex natural organic molecules as lipid or carbohydrates, learning how to estimate and reproduce their dynamics. Later, we will revise the structure of more complex macromolecules as proteins or DNA. Along this process, we will refer to different computational tools and databases that will help us to search, analyze and model the different molecular systems studied in this course

    Evolutionary systems biology of bacterial metabolic adaptation

    Get PDF

    Automatically exploiting genomic and metabolic contexts to aid the functional annotation of prokaryote genomes

    Get PDF
    Cette thèse porte sur le développement d'approches bioinformatiques exploitant de l'information de contextes génomiques et métaboliques afin de générer des annotations fonctionnelles de gènes prokaryotes, et comporte deux projets principaux. Le premier projet focalise sur les activités enzymatiques orphelines de séquence. Environ 27% des activités définies par le International Union of Biochemistry and Molecular Biology sont encore aujourd'hui orphelines. Pour celles-ci, les méthodes bioinformatiques traditionnelles ne peuvent proposer de gènes candidats; il est donc impératif d'utiliser des méthodes exploitant des informations contextuelles dans ces cas. La stratégie CanOE (fishingCandidate genes for Orphan Enzymes) a été développée et rajoutée à la plateforme MicroScope dans ce but, intégrant des informations génomiques et métaboliques sur des milliers d'organismes prokaryotes afin de localiser des gènes probants pour des activités orphelines. Le projet miroir au précédent est celui des protéines de fonction inconnue. Un projet collaboratif a été initié au Genoscope afin de formaliser les stratégies d'exploration des fonctions de familles protéiques prokaryotes. Une version pilote du projet a été mise en place sur la famille DUF849 dont une fonction enzymatique avait été récemment découverte. Des stratégies de proposition d'activités enzymatiques alternatives et d'établissement de sous familles isofonctionnelles ont été mises en place dans le cadre de cette thèse, afin de guider les expérimentations de paillasse et d'analyser leurs résultats.The subject of this thesis concerns the development of bioinformatic strategies exploiting genomic and metabolic contextual information in order to generate functional annotations for prokaryote genes. Two main projects were involved during this work: the first focuses on sequence-orphan enzymatic activities. Today, roughly 27% of activities defined by International Union of Biochemistry and Molecular Biology are sequence-orphans. For these, traditional bioinformatic approaches cannot propose candidate genes. It is thus imperative to use alternative, context-based approaches in such cases. The CanOE strategy fishing Candidate genes for Orphan Enzymes) was developed and added to the MicroScope bioinformatics platform in this aim. It integrates genomic and metabolic information across thousands of prokaryote genomes in order to locate promising gene candidates for orphan activities. The mirror project focuses on protein families of unknown function. A collaborative project has been set up at the Genoscope in hope of formalising functional exploration strategies for prokaryote protein families. A pilot version was created on the DUF849 Pfam family, for which a single activity had recently been elucidated. Strategies for proposing novel functions and activities and creating isofunctional sub-families were researched, so as to guide biochemical experimentations and to analyse their results.EVRY-Bib. électronique (912289901) / SudocSudocFranceF

    MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY

    Get PDF
    The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world

    Activity fingerprints in DNA based on a structural analysis of sequence information.

    Get PDF
    The function of a DNA sequence is commonly predicted by measuring its nucleotide similarity to known functional sets. However, the use of structural properties to identify patterns within families is justified by the discovery that many very different sequences have similar structural properties. The aim of this thesis is to develop tools that detect any unusual structural characteristics of a particular sequence or that identify DNA structure-activity fingerprints common to a set. This work uses the Octamer Database to describe DNA. The database's contents are split into two categories: those parameters that describe minimum energy structure and those that measure flexibility. Information from both of these categories has been combined to describe structural tendencies, offering an alternative measure of sequence similarity. A structural DNA profile gives a graphical illustration of how a parameter from the Octamer Database varies across either a single sequence's length or across a set of sequences. Profile Manager is an application that has been developed to automate single sequence profile generation and is used to study the A-tract phenomenon. The use of profiles to explore patterns in flexibility across a set of pre-aligned promoters is then investigated with interesting transitions in decreasing twist flexibility discovered. Multiple sequence queries are harder to solve than those of single sequences, due to the inherent need for the sequences to be aligned. It is only under rare circumstances that sequences are pre-aligned by an experimentally determined position. More commonly a multiple alignment must be generated. An extended, structure-based, hidden Markov model technique that successfully generates structural alignment~ is presented. Its. application is tested on four DNA protein binding site datasets with comparisons made to the traditional sequence method. Structural alignments of two out of the four datasets were comparable in performance to sequence with useful insights into underlying structural mechanisms

    Experiment Planning for Protein Structure Elucidation and Site-Directed Protein Recombination

    Get PDF
    In order to most effectively investigate protein structure and improve protein function, it is necessary to carefully plan appropriate experiments. The combinatorial number of possible experiment plans demands effective criteria and efficient algorithms to choose the one that is in some sense optimal. This thesis addresses experiment planning challenges in two significant applications. The first part of this thesis develops an integrated computational-experimental approach for rapid discrimination of predicted protein structure models by quantifying their consistency with relatively cheap and easy experiments (cross-linking and site-directed mutagenesis followed by stability measurement). In order to obtain the most information from noisy and sparse experimental data, rigorous Bayesian frameworks have been developed to analyze the information content. Efficient algorithms have been developed to choose the most informative, least expensive, and most robust experiments. The effectiveness of this approach has been demonstrated using existing experimental data as well as simulations, and it has been applied to discriminate predicted structure models of the pTfa chaperone protein from bacteriophage lambda. The second part of this thesis seeks to choose optimal breakpoint locations for protein engineering by site-directed recombination. In order to increase the possibility of obtaining folded and functional hybrids in protein recombination, it is necessary to retain the evolutionary relationships among amino acids that determine protein stability and functionality. A probabilistic hypergraph model has been developed to model these relationships, with edge weights representing their statistical significance derived from database and a protein family. The effectiveness of this model has been validated by showing its ability to distinguish functional hybrids from non-functional ones in existing experimental data. It has been proved to be NP-hard in general to choose the optimal breakpoint locations for recombination that minimize the total perturbation to these relationships, but exact and approximate algorithms have been developed for a number of important cases
    corecore