370 research outputs found

    Flexible protein folding by ant colony optimization

    Get PDF
    Protein structure prediction is one of the most challenging topics in bioinformatics. As the protein structure is found to be closely related to its functions, predicting the folding structure of a protein to judge its functions is meaningful to the humanity. This chapter proposes a flexible ant colony (FAC) algorithm for solving protein folding problems (PFPs) based on the hydrophobic-polar (HP) square lattice model. Different from the previous ant algorithms for PFPs, the pheromones in the proposed algorithm are placed on the arcs connecting adjacent squares in the lattice. Such pheromone placement model is similar to the one used in the traveling salesmen problems (TSPs), where pheromones are released on the arcs connecting the cities. Moreover, the collaboration of effective heuristic and pheromone strategies greatly enhances the performance of the algorithm so that the algorithm can achieve good results without local search methods. By testing some benchmark two-dimensional hydrophobic-polar (2D-HP) protein sequences, the performance shows that the proposed algorithm is quite competitive compared with some other well-known methods for solving the same protein folding problems

    Is protein folding problem really a NP-complete one ? First investigations

    Full text link
    To determine the 3D conformation of proteins is a necessity to understand their functions or interactions with other molecules. It is commonly admitted that, when proteins fold from their primary linear structures to their final 3D conformations, they tend to choose the ones that minimize their free energy. To find the 3D conformation of a protein knowing its amino acid sequence, bioinformaticians use various models of different resolutions and artificial intelligence tools, as the protein folding prediction problem is a NP complete one. More precisely, to determine the backbone structure of the protein using the low resolution models (2D HP square and 3D HP cubic), by finding the conformation that minimize free energy, is intractable exactly. Both the proof of NP-completeness and the 2D prediction consider that acceptable conformations have to satisfy a self-avoiding walk (SAW) requirement, as two different amino acids cannot occupy a same position in the lattice. It is shown in this document that the SAW requirement considered when proving NP-completeness is different from the SAW requirement used in various prediction programs, and that they are different from the real biological requirement. Indeed, the proof of NP completeness and the predictions in silico consider conformations that are not possible in practice. Consequences of this fact are investigated in this research work.Comment: Submitted to Journal of Bioinformatics and Computational Biology, under revie

    Computational investigations of folded self-avoiding walks related to protein folding

    Full text link
    Various subsets of self-avoiding walks naturally appear when investigating existing methods designed to predict the 3D conformation of a protein of interest. Two such subsets, namely the folded and the unfoldable self-avoiding walks, are studied computationally in this article. We show that these two sets are equal and correspond to the whole nn-step self-avoiding walks for nâ©˝14n\leqslant 14, but that they are different for numerous nâ©ľ108n \geqslant 108, which are common protein lengths. Concrete counterexamples are provided and the computational methods used to discover them are completely detailed. A tool for studying these subsets of walks related to both pivot moves and proteins conformations is finally presented.Comment: Not yet submitte

    Modelling biomolecules through atomistic graphs: theory, implementation, and applications

    Get PDF
    Describing biological molecules through computational models enjoys ever-growing popularity. Never before has access to computational resources been easier for scientists across the natural sciences. The need for accurate, efficient, and robust modelling tools is therefore irrefutable. This, in turn, calls for highly interdisciplinary research, which the thesis presented here is a product of. Through the successful marriage of techniques from mathematical graph theory, theoretical insights from chemistry and biology, and the tools of modern computer science, we are able to computationally construct accurate depictions of biomolecules as atomistic graphs, in which individual atoms become nodes and chemical bonds/interactions are represented by weighted edges. When combined with methods from graph theory and network science, this approach has previously been shown to successfully reveal various properties of proteins, such as dynamics, rigidity, multi-scale organisation, allostery, and protein-protein interactions, and is well poised to set new standards in terms of computational feasibility, multi-scale resolution (from atoms to domains) and time-scales (from nanoseconds to milliseconds). Therefore, building on previous work in our research group spanning over 15 years and to further encourage and facilitate research into this growing field, this thesis's main contribution is to provide a formalised foundation for the construction of atomistic graphs. The most crucial aspect of constructing atomistic graphs of large biomolecules compared to small molecules is the necessity to include a variety of different types of bonds and interactions, because larger biomolecules attain their unique structural layout mainly through weaker interactions, e.g. hydrogen bonds, the hydrophobic effect or π-π interactions. Whilst most interaction types are well-studied and have readily available methodology which can be used to construct atomistic graphs, this is not the case for hydrophobic interactions. To fill this gap, the work presented herein includes novel methodology for encoding the hydrophobic effect in atomistic graphs, that accounts for the many-body effect and non-additivity. Then, a standalone software package for constructing atomistic graphs from structural data is presented. Herein lies the heart of this thesis: the combination of a variety of methodologies for a range of bond/interaction types, as well as an implementation that is deterministic, easy-to-use and efficient. Finally, some promising avenues for utilising atomistic graphs in combination with graph theoretical tools such as Markov Stability as well as other approaches such as Multilayer Networks to study various properties of biomolecules are presented.Open Acces

    Applying Deep Reinforcement Learning to the HP Model for Protein Structure Prediction

    Full text link
    A central problem in computational biophysics is protein structure prediction, i.e., finding the optimal folding of a given amino acid sequence. This problem has been studied in a classical abstract model, the HP model, where the protein is modeled as a sequence of H (hydrophobic) and P (polar) amino acids on a lattice. The objective is to find conformations maximizing H-H contacts. It is known that even in this reduced setting, the problem is intractable (NP-hard). In this work, we apply deep reinforcement learning (DRL) to the two-dimensional HP model. We can obtain the conformations of best known energies for benchmark HP sequences with lengths from 20 to 50. Our DRL is based on a deep Q-network (DQN). We find that a DQN based on long short-term memory (LSTM) architecture greatly enhances the RL learning ability and significantly improves the search process. DRL can sample the state space efficiently, without the need of manual heuristics. Experimentally we show that it can find multiple distinct best-known solutions per trial. This study demonstrates the effectiveness of deep reinforcement learning in the HP model for protein folding.Comment: Published at Physica A: Statistical Mechanics and its Applications, available online 7 December 2022. Extended abstract accepted by the Machine Learning and the Physical Sciences workshop, NeurIPS 202

    Exact, constraint-based structure prediction in simple protein models

    Get PDF
    Die Arbeit untersucht die exakte Vorhersage der Struktur von Proteinen in dreidimensionalen, abstrakten Proteinmodellen; insbesondere wird ein exakter Ansatz zur Strukturvorhersage in den HP-Modellen (Lau und Dill, ACS, 1989) des kubischen und kubisch-flächenzentrierten Gitters entwickelt und diskutiert. Im Gegensatz zu heuristischen Methoden liefert das vorgestellte exakte Verfahren beweisbar korrekte Strukturen. HP-Modelle (Hydrophob, Polar) repräsentieren die Rückgratkonformation eines Proteins durch Gitterpunkte und berücksichti\-gen ausschließlich die hydrophobe Wechselwirkung als treibende Kraft bei der Ausbildung der Proteinstruktur. Wesentlich für die erfolgreiche Umsetzung des vorgestellten Verfahrens ist die Verwendung von constraint-basierten Techniken. Im Zentrum steht die Berechnung und Anwendung hydrophober Kerne für die Strukturvorhersage

    Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling

    Get PDF
    Protein domains are the basic units of protein structure and function. Comparative analysis of genomes and proteomes showed that domain recombination is a main driver of multidomain protein functional diversification and some of the constraining genomic mechanisms are known. Much less is known about biophysical mechanisms that determine whether protein domains can be combined into viable protein folds. Here, we use massively parallel insertional mutagenesis to determine compatibility of over 300,000 domain recombination variants of the Inward Rectifier K+ channel Kir2.1 with channel surface expression. Our data suggest that genomic and biophysical mechanisms acted in concert to favor gain of large, structured domain at protein termini during ion channel evolution. We use machine learning to build a quantitative biophysical model of domain compatibility in Kir2.1 that allows us to derive rudimentary rules for designing domain insertion variants that fold and traffic to the cell surface. Positional Kir2.1 responses to motif insertion clusters into distinct groups that correspond to contiguous structural regions of the channel with distinct biophysical properties tuned towards providing either folding stability or gating transitions. This suggests that insertional profiling is a high-throughput method to annotate function of ion channel structural regions

    Understanding the Structural and Functional Importance of Early Folding Residues in Protein Structures

    Get PDF
    Proteins adopt three-dimensional structures which serve as a starting point to understand protein function and their evolutionary ancestry. It is unclear how proteins fold in vivo and how this process can be recreated in silico in order to predict protein structure from sequence. Contact maps are a possibility to describe whether two residues are in spatial proximity and structures can be derived from this simplified representation. Coevolution or supervised machine learning techniques can compute contact maps from sequence: however, these approaches only predict sparse subsets of the actual contact map. It is shown that the composition of these subsets substantially influences the achievable reconstruction quality because most information in a contact map is redundant. No strategy was proposed which identifies unique contacts for which no redundant backup exists. The StructureDistiller algorithm quantifies the structural relevance of individual contacts and identifies crucial contacts in protein structures. It is demonstrated that using this information the reconstruction performance on a sparse subset of a contact map is increased by 0.4 A, which constitutes a substantial performance gain. The set of the most relevant contacts in a map is also more resilient to false positively predicted contacts: up to 6% of false positives are compensated before reconstruction quality matches a naive selection of contacts without any false positive contacts. This information is invaluable for the training to new structure prediction methods and provides insights into how robustness and information content of contact maps can be improved. In literature, the relevance of two types of residues for in vivo folding has been described. Early folding residues initiate the folding process, whereas highly stable residues prevent spontaneous unfolding events. The structural relevance score proposed by this thesis is employed to characterize both types of residues. Early folding residues form pivotal secondary structure elements, but their structural relevance is average. In contrast, highly stable residues exhibit significantly increased structural relevance. This implies that residues crucial for the folding process are not relevant for structural integrity and vice versa. The position of early folding residues is preserved over the course of evolution as demonstrated for two ancient regions shared by all aminoacyl-tRNA synthetases. One arrangement of folding initiation sites resembles an ancient and widely distributed structural packing motif and captures how reverberations of the earliest periods of life can still be observed in contemporary protein structures

    Sequence Determinants of the Individual and Collective Behaviour of Intrinsically Disordered Proteins

    Get PDF
    Intrinsically disordered proteins and protein regions (IDPs) represent around thirty percent of the eukaryotic proteome. IDPs do not fold into a set three dimensional structure, but instead exist in an ensemble of inter-converting states. Despite being disordered, IDPs are decidedly not random; well-defined - albeit transient - local and long-range interactions give rise to an ensemble with distinct statistical biases over many length-scales. Among a variety of cellular roles, IDPs drive and modulate the formation of phase separated intracellular condensates, non-stoichiometric assemblies of protein and nucleic acid that serve many functions. In this work, we have explored how the amino acid sequence of IDPs determines their conformational behaviour, and how sequence and single chain behaviour influence their collective behaviour in the context of phase separation. In part I, in a series of studies, we used simulation, theory, and statistical analysis coupled with a wide range of experimental approaches to uncover novel rules that further explore how primary sequence and local structure influence the global and local behaviour of disordered proteins, with direct implications for protein function and evolution. We found that amino acid sidechains counteract the intrinsic collapse of the peptide backbone, priming the backbone for interaction and providing a fully reconciliatory explanation for the mechanism of action associated with the denaturants urea and GdmCl. We discovered that proline can engender a conformational buffering effect in IDPs to counteract standard electrostatic effects, and that the patterning those proline residues can be a crucial determinant of the conformational ensemble. We developed a series of tools for analysing primary sequences on a proteome wide scale and used them to discover that different organisms can have substantially different average sequence properties. Finally, we determined that for the normally folded protein NTL9, the unfolded state under folding conditions is relatively expanded but has well defined native and non-native structural preferences. In part II, we identified a novel mode of phase separation in biology, and explored how this could be tuned through sequence design. We discovered that phase separated liquids can be many orders of magnitude more dilute than simple mean-field theories would predict, and developed an analytic framework to explain and understand this phenomenon. Finally, we designed, developed and implemented a novel lattice-based simulation engine (PIMMS) to provide sequence-specific insight into the determinants of conformational behaviour and phase separation. PIMMS allows us to accurately and rapidly generate sequence-specific conformational ensembles and run simulations of hundreds of polymers with the goal of allowing us to systematically elucidate the link between primary sequence of phase separation
    • …
    corecore