8,398 research outputs found
A topological approach for protein classification
Protein function and dynamics are closely related to its sequence and
structure. However prediction of protein function and dynamics from its
sequence and structure is still a fundamental challenge in molecular biology.
Protein classification, which is typically done through measuring the
similarity be- tween proteins based on protein sequence or physical
information, serves as a crucial step toward the understanding of protein
function and dynamics. Persistent homology is a new branch of algebraic
topology that has found its success in the topological data analysis in a
variety of disciplines, including molecular biology. The present work explores
the potential of using persistent homology as an indepen- dent tool for protein
classification. To this end, we propose a molecular topological fingerprint
based support vector machine (MTF-SVM) classifier. Specifically, we construct
machine learning feature vectors solely from protein topological fingerprints,
which are topological invariants generated during the filtration process. To
validate the present MTF-SVM approach, we consider four types of problems.
First, we study protein-drug binding by using the M2 channel protein of
influenza A virus. We achieve 96% accuracy in discriminating drug bound and
unbound M2 channels. Additionally, we examine the use of MTF-SVM for the
classification of hemoglobin molecules in their relaxed and taut forms and
obtain about 80% accuracy. The identification of all alpha, all beta, and
alpha-beta protein domains is carried out in our next study using 900 proteins.
We have found a 85% success in this identifica- tion. Finally, we apply the
present technique to 55 classification tasks of protein superfamilies over 1357
samples. An average accuracy of 82% is attained. The present study establishes
computational topology as an independent and effective alternative for protein
classification
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
The effect of protein backbone hydration on the amide vibrations in Raman and Raman optical activity spectra
Raman and specifically Raman optical activity (ROA) spectroscopy are very sensitive to the solution structure and conformation of biomolecules. Because of this strong conformational sensitivity, density functional theory (DFT) calculations are often used to get a better understanding of the experimentally observed spectral patterns. While e.g. for carbohydrate structure the water molecules that surround the solute have been demonstrated to be of vital importance to get accurate modelled ROA spectra, the effect of explicit water molecules on the calculated ROA patterns of peptides and proteins is less well studied. Here, the effect of protein backbone hydration was studied using DFT calculations of HCO-(l-Ala)(5)-NH2 in specific secondary structure conformations with different treatments of the solvation. The effect of the explicit water molecules on the calculated spectra mainly arises from the formation of hydrogen bonds with the amide C?O and N-H groups. Hydrogen bonding of water with the C?O group determines the shape and position of the amide I band. The C?O bond length increases upon formation of C?OH2O hydrogen bonds. The effect of the explicit water molecules on the amide III vibrations arises from hydrogen bonding of the solvent with both the C?O and N-H group, but their contributions to this spectral region differ: geometrically, the formation of a C?OH2O bond decreases the C-N bond length, while upon forming a N-HH2O hydrogen bond, the N-H bond length increases
Extending fragment-based free energy calculations with library Monte Carlo simulation: Annealing in interaction space
Pre-calculated libraries of molecular fragment configurations have previously
been used as a basis for both equilibrium sampling (via "library-based Monte
Carlo") and for obtaining absolute free energies using a polymer-growth
formalism. Here, we combine the two approaches to extend the size of systems
for which free energies can be calculated. We study a series of all-atom
poly-alanine systems in a simple dielectric "solvent" and find that precise
free energies can be obtained rapidly. For instance, for 12 residues, less than
an hour of single-processor is required. The combined approach is formally
equivalent to the "annealed importance sampling" algorithm; instead of
annealing by decreasing temperature, however, interactions among fragments are
gradually added as the molecule is "grown." We discuss implications for future
binding affinity calculations in which a ligand is grown into a binding site
Variational Methods for Biomolecular Modeling
Structure, function and dynamics of many biomolecular systems can be
characterized by the energetic variational principle and the corresponding
systems of partial differential equations (PDEs). This principle allows us to
focus on the identification of essential energetic components, the optimal
parametrization of energies, and the efficient computational implementation of
energy variation or minimization. Given the fact that complex biomolecular
systems are structurally non-uniform and their interactions occur through
contact interfaces, their free energies are associated with various interfaces
as well, such as solute-solvent interface, molecular binding interface, lipid
domain interface, and membrane surfaces. This fact motivates the inclusion of
interface geometry, particular its curvatures, to the parametrization of free
energies. Applications of such interface geometry based energetic variational
principles are illustrated through three concrete topics: the multiscale
modeling of biomolecular electrostatics and solvation that includes the
curvature energy of the molecular surface, the formation of microdomains on
lipid membrane due to the geometric and molecular mechanics at the lipid
interface, and the mean curvature driven protein localization on membrane
surfaces. By further implicitly representing the interface using a phase field
function over the entire domain, one can simulate the dynamics of the interface
and the corresponding energy variation by evolving the phase field function,
achieving significant reduction of the number of degrees of freedom and
computational complexity. Strategies for improving the efficiency of
computational implementations and for extending applications to coarse-graining
or multiscale molecular simulations are outlined.Comment: 36 page
- …