26,419 research outputs found

    TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

    Full text link
    Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

    Hot-spot analysis for drug discovery targeting protein-protein interactions

    Get PDF
    Introduction: Protein-protein interactions are important for biological processes and pathological situations, and are attractive targets for drug discovery. However, rational drug design targeting protein-protein interactions is still highly challenging. Hot-spot residues are seen as the best option to target such interactions, but their identification requires detailed structural and energetic characterization, which is only available for a tiny fraction of protein interactions. Areas covered: In this review, the authors cover a variety of computational methods that have been reported for the energetic analysis of protein-protein interfaces in search of hot-spots, and the structural modeling of protein-protein complexes by docking. This can help to rationalize the discovery of small-molecule inhibitors of protein-protein interfaces of therapeutic interest. Computational analysis and docking can help to locate the interface, molecular dynamics can be used to find suitable cavities, and hot-spot predictions can focus the search for inhibitors of protein-protein interactions. Expert opinion: A major difficulty for applying rational drug design methods to protein-protein interactions is that in the majority of cases the complex structure is not available. Fortunately, computational docking can complement experimental data. An interesting aspect to explore in the future is the integration of these strategies for targeting PPIs with large-scale mutational analysis.This work has been funded by grants BIO2016-79930-R and SEV-2015-0493 from the Spanish Ministry of Economy, Industry and Competitiveness, and grant EFA086/15 from EU Interreg V POCTEFA. M Rosell is supported by an FPI fellowship from the Severo Ochoa program. The authors are grateful for the support of the the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    Knowledge-based energy functions for computational studies of proteins

    Full text link
    This chapter discusses theoretical framework and methods for developing knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also describe a geometric model for developing both linear and non-linear potential functions by optimization. Applications of knowledge-based potential functions in protein-decoy discrimination, in protein-protein interactions, and in protein design are then described. Several issues of knowledge-based potential functions are finally discussed.Comment: 57 pages, 6 figures. To be published in a book by Springe

    Deriving a mutation index of carcinogenicity using protein structure and protein interfaces

    Get PDF
    With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/

    The network of stabilizing contacts in proteins studied by coevolutionary data

    Full text link
    The primary structure of proteins, that is their sequence, represents one of the most abundant set of experimental data concerning biomolecules. The study of correlations in families of co--evolving proteins by means of an inverse Ising--model approach allows to obtain information on their native conformation. Following up on a recent development along this line, we optimize the algorithm to calculate effective energies between the residues, validating the approach both back-calculating interaction energies in a model system, and predicting the free energies associated to mutations in real systems. Making use of these effective energies, we study the networks of interactions which stabilizes the native conformation of some well--studied proteins, showing that it display different properties than the associated contact network

    Computational and Experimental Approaches to Reveal the Effects of Single Nucleotide Polymorphisms with Respect to Disease Diagnostics

    Get PDF
    DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules

    Mapping genetic variations to three- dimensional protein structures to enhance variant interpretation: a proposed framework

    Get PDF
    The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods

    Computational modeling of protein mutant stability: analysis and optimization of statistical potentials and structural features reveal insights into prediction model development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding and predicting protein stability upon point mutations has wide-spread importance in molecular biology. Several prediction models have been developed in the past with various algorithms. Statistical potentials are one of the widely used algorithms for the prediction of changes in stability upon point mutations. Although the methods provide flexibility and the capability to develop an accurate and reliable prediction model, it can be achieved only by the right selection of the structural factors and optimization of their parameters for the statistical potentials. In this work, we have selected five atom classification systems and compared their efficiency for the development of amino acid atom potentials. Additionally, torsion angle potentials have been optimized to include the orientation of amino acids in such a way that altered backbone conformation in different secondary structural regions can be included for the prediction model. This study also elaborates the importance of classifying the mutations according to their solvent accessibility and secondary structure specificity. The prediction efficiency has been calculated individually for the mutations in different secondary structural regions and compared.</p> <p>Results</p> <p>Results show that, in addition to using an advanced atom description, stepwise regression and selection of atoms are necessary to avoid the redundancy in atom distribution and improve the reliability of the prediction model validation. Comparing to other atom classification models, Melo-Feytmans model shows better prediction efficiency by giving a high correlation of 0.85 between experimental and theoretical ΔΔG with 84.06% of the mutations correctly predicted out of 1538 mutations. The theoretical ΔΔG values for the mutations in partially buried <it>β</it>-strands generated by the structural training dataset from PISCES gave a correlation of 0.84 without performing the Gaussian apodization of the torsion angle distribution. After the Gaussian apodization, the correlation increased to 0.92 and prediction accuracy increased from 80% to 88.89% respectively.</p> <p>Conclusion</p> <p>These findings were useful for the optimization of the Melo-Feytmans atom classification system and implementing them to develop the statistical potentials. It was also significant that the prediction efficiency of mutations in the partially buried <it>β</it>-strands improves with the help of Gaussian apodization of the torsion angle distribution. All these comparisons and optimization techniques demonstrate their advantages as well as the restrictions for the development of the prediction model. These findings will be quite helpful not only for the protein stability prediction, but also for various structure solutions in future.</p

    Computational Design of Stable and Soluble Biocatalysts

    Get PDF
    Natural enzymes are delicate biomolecules possessing only marginal thermodynamic stability. Poorly stable, misfolded, and aggregated proteins lead to huge economic losses in the biotechnology and biopharmaceutical industries. Consequently, there is a need to design optimized protein sequences that maximize stability, solubility, and activity over a wide range of temperatures and pH values in buffers of different composition and in the presence of organic cosolvents. This has created great interest in using computational methods to enhance biocatalysts' robustness and solubility. Suitable methods include (i) energy calculations, (ii) machine learning, (iii) phylogenetic analyses, and (iv) combinations of these approaches. We have witnessed impressive progress in the design of stable enzymes over the last two decades, but predictions of protein solubility and expressibility are scarce. Stabilizing mutations can be predicted accurately using available force fields, and the number of sequences available for phylogenetic analyses is growing. In addition, complex computational workflows are being implemented in intuitive web tools, enhancing the quality of protein stability predictions. Conversely, solubility predictors are limited by the lack of robust and balanced experimental data, an inadequate understanding of fundamental principles of protein aggregation, and a dearth of structural information on folding intermediates. Here we summarize recent progress in the development of computational tools for predicting protein stability and solubility, critically assess their strengths and weaknesses, and identify apparent gaps in data and knowledge. We also present perspectives on the computational design of stable and soluble biocatalysts

    Analyzing Effects of Naturally Occurring Missense Mutations

    Get PDF
    Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well
    corecore