65,695 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Characterization of Aptamer-Protein Complexes by X-ray Crystallography and Alternative Approaches
Aptamers are oligonucleotide ligands, either RNA or ssDNA, selected for high-affinity binding to molecular targets, such as small organic molecules, proteins or whole microorganisms. While reports of new aptamers are numerous, characterization of their specific interaction is often restricted to the affinity of binding (KD). Over the years, crystal structures of aptamer-protein complexes have only scarcely become available. Here we describe some relevant technical issues about the process of crystallizing aptamer-protein complexes and highlight some biochemical details on the molecular basis of selected aptamer-protein interactions. In addition, alternative experimental and computational approaches are discussed to study aptamer-protein interactions.
Knowledge-based energy functions for computational studies of proteins
This chapter discusses theoretical framework and methods for developing
knowledge-based potential functions essential for protein structure prediction,
protein-protein interaction, and protein sequence design. We discuss in some
details about the Miyazawa-Jernigan contact statistical potential,
distance-dependent statistical potentials, as well as geometric statistical
potentials. We also describe a geometric model for developing both linear and
non-linear potential functions by optimization. Applications of knowledge-based
potential functions in protein-decoy discrimination, in protein-protein
interactions, and in protein design are then described. Several issues of
knowledge-based potential functions are finally discussed.Comment: 57 pages, 6 figures. To be published in a book by Springe
PocketPicker: analysis of ligand binding-sites with shape descriptors
Background Identification and evaluation of surface binding-pockets and occluded cavities are initial steps in protein structure-based drug design. Characterizing the active site's shape as well as the distribution of surrounding residues plays an important role for a variety of applications such as automated ligand docking or in situ modeling. Comparing the shape similarity of binding site geometries of related proteins provides further insights into the mechanisms of ligand binding. Results We present PocketPicker, an automated grid-based technique for the prediction of protein binding pockets that specifies the shape of a potential binding-site with regard to its buriedness. The method was applied to a representative set of protein-ligand complexes and their corresponding apo-protein structures to evaluate the quality of binding-site predictions. The performance of the pocket detection routine was compared to results achieved with the existing methods CAST, LIGSITE, LIGSITEcs, PASS and SURFNET. Success rates PocketPicker were comparable to those of LIGSITEcs and outperformed the other tools. We introduce a descriptor that translates the arrangement of grid points delineating a detected binding-site into a correlation vector. We show that this shape descriptor is suited for comparative analyses of similar binding-site geometry by examining induced-fit phenomena in aldose reductase. This new method uses information derived from calculations of the buriedness of potential binding-sites. Conclusions The pocket prediction routine of PocketPicker is a useful tool for identification of potential protein binding-pockets. It produces a convenient representation of binding-site shapes including an intuitive description of their accessibility. The shape-descriptor for automated classification of binding-site geometries can be used as an additional tool complementing elaborate manual inspections
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
- …