24,511 research outputs found
HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction
Applying deep learning concepts from image detection and graph theory has
greatly advanced protein-ligand binding affinity prediction, a challenge with
enormous ramifications for both drug discovery and protein engineering. We
build upon these advances by designing a novel deep learning architecture
consisting of a 3-dimensional convolutional neural network utilizing
channel-wise attention and two graph convolutional networks utilizing
attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based
Convolutional Neural Network) obtains state-of-the-art results on the PDBbind
v.2016 core set, the most widely recognized benchmark in the field. We
extensively assess the generalizability of our model using multiple train-test
splits, each of which maximizes differences between either protein structures,
protein sequences, or ligand extended-connectivity fingerprints of complexes in
the training and test sets. Furthermore, we perform 10-fold cross-validation
with a similarity cutoff between SMILES strings of ligands in the training and
test sets, and also evaluate the performance of HAC-Net on lower-quality data.
We envision that this model can be extended to a broad range of supervised
learning problems related to structure-based biomolecular property prediction.
All of our software is available as open source at
https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is
available through PyPI
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
- …