136,500 research outputs found

    Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

    Get PDF
    While genes are defined by sequence, in biological systems a protein's function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction. Many protein structures of interest remain out of reach for both computational prediction and experimental determination. DeepContact learns patterns of co-evolution across thousands of experimentally determined structures, identifying conserved local motifs and leveraging this information to improve protein residue-residue contact predictions. DeepContact extracts additional information from the evolutionary couplings using its knowledge of co-evolution and structural space, while also converting coupling scores into probabilities that are comparable across protein sequences and alignments. Keywords: contact prediction; convolutional neural networks; deep learning; protein structure prediction; structure prediction; co-evolution; evolutionary couplingsNational Institutes of Health (U.S.) (Grant R01GM081871

    Explainable Representations for Relation Prediction in Knowledge Graphs

    Full text link
    Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.Comment: 16 pages, 3 figure

    KFC Server: interactive forecasting of protein interaction hot spots

    Get PDF
    The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model—a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein–protein or protein–DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org

    Knowledge-based approaches for understanding structure-dynamics-function relationship in proteins

    Get PDF
    Proteins accomplish their functions through conformational changes, often brought about by changes in environmental conditions or ligand binding. Predicting the functional mechanisms of proteins is impossible without a deeper understanding of conformational transitions. Dynamics is the key link between the structure and function of proteins. The protein data bank (PDB) contains multiple structures of the same protein, which have been solved under different conditions, using different experimental methods or in complexes with different ligands. These alternate conformations of the same protein (or similar proteins) can provide important information about what conformational changes take place and how they are brought about. Though there have been multiple computational approaches developed to predict dynamics from structure information, little work has been done to exploit this apparent, but potentially informative, redundancy in the PDB. In this work I bridge this gap by exploring various knowledge-based approaches to understand the structure-dynamics relationship and how it translates into protein function. First, a novel method for constructing free energy landscapes for conformational changes in proteins is proposed by combining principal motions with knowledge-based potential energies and entropies from coarse-grained models of protein dynamics. Second, an innovative method for computing knowledge-based entropies for proteins using an inverse Boltzmann approach is introduced, similar to the manner in which statistical potentials were previously extracted. We hypothesize that amino acid contact changes observed in the course of conformational changes within a large set of proteins can provide information about local pairwise flexibilities or entropies. By combining this new entropy measure with knowledge-based potential functions, we formulate a knowledge-based free energy (KBF) function that we demonstrate outperforms other statistical potentials in its ability to identify native protein structures embedded with sets of decoys. Third, I apply the methods developed above in collaboration with experimentalists to understand the molecular mechanisms of conformational changes in several protein systems including cadherins and membrane transporters. This work introduces several ways that the huge data in the PDB can be utilized to understand the underlying principles behind the structure-dynamics-function relationships of proteins. Results from this work have several important applications in structural bioinformatics such as structure prediction, molecular docking, protein engineering and design. In particular, the new KBFs developed in this dissertation have immediate applications in emerging topics such as prediction of 3D structure from coevolving residues in sequence alignments as well as in identifying the phenotypic effects of mutants

    Novel techniques for protein structure characterization using graph representation of proteins

    Get PDF
    Proteins exhibit an infinite variety of structures. Around 50K 3D structures of proteins exist in PDB database among unlimited possibilities. The three dimensional structure of a protein is crucial to its function. Even within a common structure family, proteins vary in length, size, and sequence. This variation is the reflection of evolution on protein sequences. The intrinsic information in protein structures can be captured by their graph representations. The structural similarities between protein families can be deduced using their structural features such as connectivity, betweenness, and cliquishness. Most of the structure comparison and alignment methods use all atom coordinates that’s why they need reliable full atom representation of proteins which is difficult to obtain using experimental methods. These methods can be used for variety of problems in bioinformatics such as protein fold prediction, function annotation, domain prediction, and fold classification. Our approach can capture the same knowledge by using much less information from the actual structure. In this thesis, we used graph representations of proteins and graph theoretical properties to discriminate native and non-native proteins. Then we used these methods to find out overall and local similarity of protein structures by using dynamic programming. Afterward, local alignment using dynamic programming is used to determine the function of a protein. Moreover, sub graph matching algorithms was employed for domain prediction. In order to find the correct fold we also developed a genetic algorithm based threading approach. All these applications gave better or comparable results to state of the art

    Potentials of Mean Force for Protein Structure Prediction Vindicated, Formalized and Generalized

    Get PDF
    Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge based potentials based on pairwise distances -- so-called "potentials of mean force" (PMFs) -- have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state -- a necessary component of these potentials -- is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities reference ratio distributions deriving from the application of the reference ratio method. This new view is not only of theoretical relevance, but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures

    CLP-based protein fragment assembly

    Full text link
    The paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict the 3D conformation of a protein via fragments assembly. The fragments are extracted by a preprocessor-also developed for this work- from a database of known protein structures that clusters and classifies the fragments according to similarity and frequency. The problem of assembling fragments into a complete conformation is mapped to a constraint solving problem and solved using CLP. The constraint-based model uses a medium discretization degree Ca-side chain centroid protein model that offers efficiency and a good approximation for space filling. The approach adapts existing energy models to the protein representation used and applies a large neighboring search strategy. The results shows the feasibility and efficiency of the method. The declarative nature of the solution allows to include future extensions, e.g., different size fragments for better accuracy.Comment: special issue dedicated to ICLP 201

    Extraction of hidden information by efficient community detection in networks

    Get PDF
    Currently, we are overwhelmed by a deluge of experimental data, and network physics has the potential to become an invaluable method to increase our understanding of large interacting datasets. However, this potential is often unrealized for two reasons: uncovering the hidden community structure of a network, known as community detection, is difficult, and further, even if one has an idea of this community structure, it is not a priori obvious how to efficiently use this information. Here, to address both of these issues, we, first, identify optimal community structure of given networks in terms of modularity by utilizing a recently introduced community detection method. Second, we develop an approach to use this community information to extract hidden information from a network. When applied to a protein-protein interaction network, the proposed method outperforms current state-of-the-art methods that use only the local information of a network. The method is generally applicable to networks from many areas.Comment: 17 pages, 2 figures and 2 table

    Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors

    Full text link
    An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only CαC_\alpha or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue specific reduced discrete state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or CαC_\alpha atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side chain centers or coordinates of all side chain atoms. By reducing the residue alphabets down to size 5 for local structure-sequence relationship, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding.Comment: 20 pages, 5 figures, 4 tables. In press, Protein
    corecore