8,398 research outputs found

    A topological approach for protein classification

    Full text link
    Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    The effect of protein backbone hydration on the amide vibrations in Raman and Raman optical activity spectra

    Get PDF
    Raman and specifically Raman optical activity (ROA) spectroscopy are very sensitive to the solution structure and conformation of biomolecules. Because of this strong conformational sensitivity, density functional theory (DFT) calculations are often used to get a better understanding of the experimentally observed spectral patterns. While e.g. for carbohydrate structure the water molecules that surround the solute have been demonstrated to be of vital importance to get accurate modelled ROA spectra, the effect of explicit water molecules on the calculated ROA patterns of peptides and proteins is less well studied. Here, the effect of protein backbone hydration was studied using DFT calculations of HCO-(l-Ala)(5)-NH2 in specific secondary structure conformations with different treatments of the solvation. The effect of the explicit water molecules on the calculated spectra mainly arises from the formation of hydrogen bonds with the amide C?O and N-H groups. Hydrogen bonding of water with the C?O group determines the shape and position of the amide I band. The C?O bond length increases upon formation of C?OH2O hydrogen bonds. The effect of the explicit water molecules on the amide III vibrations arises from hydrogen bonding of the solvent with both the C?O and N-H group, but their contributions to this spectral region differ: geometrically, the formation of a C?OH2O bond decreases the C-N bond length, while upon forming a N-HH2O hydrogen bond, the N-H bond length increases

    Extending fragment-based free energy calculations with library Monte Carlo simulation: Annealing in interaction space

    Get PDF
    Pre-calculated libraries of molecular fragment configurations have previously been used as a basis for both equilibrium sampling (via "library-based Monte Carlo") and for obtaining absolute free energies using a polymer-growth formalism. Here, we combine the two approaches to extend the size of systems for which free energies can be calculated. We study a series of all-atom poly-alanine systems in a simple dielectric "solvent" and find that precise free energies can be obtained rapidly. For instance, for 12 residues, less than an hour of single-processor is required. The combined approach is formally equivalent to the "annealed importance sampling" algorithm; instead of annealing by decreasing temperature, however, interactions among fragments are gradually added as the molecule is "grown." We discuss implications for future binding affinity calculations in which a ligand is grown into a binding site

    Variational Methods for Biomolecular Modeling

    Full text link
    Structure, function and dynamics of many biomolecular systems can be characterized by the energetic variational principle and the corresponding systems of partial differential equations (PDEs). This principle allows us to focus on the identification of essential energetic components, the optimal parametrization of energies, and the efficient computational implementation of energy variation or minimization. Given the fact that complex biomolecular systems are structurally non-uniform and their interactions occur through contact interfaces, their free energies are associated with various interfaces as well, such as solute-solvent interface, molecular binding interface, lipid domain interface, and membrane surfaces. This fact motivates the inclusion of interface geometry, particular its curvatures, to the parametrization of free energies. Applications of such interface geometry based energetic variational principles are illustrated through three concrete topics: the multiscale modeling of biomolecular electrostatics and solvation that includes the curvature energy of the molecular surface, the formation of microdomains on lipid membrane due to the geometric and molecular mechanics at the lipid interface, and the mean curvature driven protein localization on membrane surfaces. By further implicitly representing the interface using a phase field function over the entire domain, one can simulate the dynamics of the interface and the corresponding energy variation by evolving the phase field function, achieving significant reduction of the number of degrees of freedom and computational complexity. Strategies for improving the efficiency of computational implementations and for extending applications to coarse-graining or multiscale molecular simulations are outlined.Comment: 36 page
    • …
    corecore