38,475 research outputs found

    TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

    Full text link
    Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

    A topological approach for protein classification

    Full text link
    Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification

    Mathematics at the eve of a historic transition in biology

    Full text link
    A century ago physicists and mathematicians worked in tandem and established quantum mechanism. Indeed, algebras, partial differential equations, group theory, and functional analysis underpin the foundation of quantum mechanism. Currently, biology is undergoing a historic transition from qualitative, phenomenological and descriptive to quantitative, analytical and predictive. Mathematics, again, becomes a driving force behind this new transition in biology.Comment: 5 pages, 2 figure

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    The Role of Data in Model Building and Prediction: A Survey Through Examples

    Get PDF
    The goal of Science is to understand phenomena and systems in order to predict their development and gain control over them. In the scientific process of knowledge elaboration, a crucial role is played by models which, in the language of quantitative sciences, mean abstract mathematical or algorithmical representations. This short review discusses a few key examples from Physics, taken from dynamical systems theory, biophysics, and statistical mechanics, representing three paradigmatic procedures to build models and predictions from available data. In the case of dynamical systems we show how predictions can be obtained in a virtually model-free framework using the methods of analogues, and we briefly discuss other approaches based on machine learning methods. In cases where the complexity of systems is challenging, like in biophysics, we stress the necessity to include part of the empirical knowledge in the models to gain the minimal amount of realism. Finally, we consider many body systems where many (temporal or spatial) scales are at play and show how to derive from data a dimensional reduction in terms of a Langevin dynamics for their slow components

    Combining Coarse-Grained Protein Models with Replica-Exchange All-Atom Molecular Dynamics

    Get PDF
    We describe a combination of all-atom simulations with CABS, a well-established coarse-grained protein modeling tool, into a single multiscale protocol. The simulation method has been tested on the C-terminal beta hairpin of protein G, a model system of protein folding. After reconstructing atomistic details, conformations derived from the CABS simulation were subjected to replica-exchange molecular dynamics simulations with OPLS-AA and AMBER99sb force fields in explicit solvent. Such a combination accelerates system convergence several times in comparison with all-atom simulations starting from the extended chain conformation, demonstrated by the analysis of melting curves, the number of native-like conformations as a function of time and secondary structure propagation. The results strongly suggest that the proposed multiscale method could be an efficient and accurate tool for high-resolution studies of protein folding dynamics in larger systems.Comment: 12 pages, 4 figure

    The role of data in model building and prediction: a survey through examples

    Get PDF
    The goal of Science is to understand phenomena and systems in order to predict their development and gain control over them. In the scientific process of knowledge elaboration, a crucial role is played by models which, in the language of quantitative sciences, mean abstract mathematical or algorithmical representations. This short review discusses a few key examples from Physics, taken from dynamical systems theory, biophysics, and statistical mechanics, representing three paradigmatic procedures to build models and predictions from available data. In the case of dynamical systems we show how predictions can be obtained in a virtually model-free framework using the methods of analogues, and we briefly discuss other approaches based on machine learning methods. In cases where the complexity of systems is challenging, like in biophysics, we stress the necessity to include part of the empirical knowledge in the models to gain the minimal amount of realism. Finally, we consider many body systems where many (temporal or spatial) scales are at play-and show how to derive from data a dimensional reduction in terms of a Langevin dynamics for their slow components
    • …
    corecore