185 research outputs found
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Machine Learning Small Molecule Properties in Drug Discovery
Machine learning (ML) is a promising approach for predicting small molecule
properties in drug discovery. Here, we provide a comprehensive overview of
various ML methods introduced for this purpose in recent years. We review a
wide range of properties, including binding affinities, solubility, and ADMET
(Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss
existing popular datasets and molecular descriptors and embeddings, such as
chemical fingerprints and graph-based neural networks. We highlight also
challenges of predicting and optimizing multiple properties during hit-to-lead
and lead optimization stages of drug discovery and explore briefly possible
multi-objective optimization techniques that can be used to balance diverse
properties while optimizing lead candidates. Finally, techniques to provide an
understanding of model predictions, especially for critical decision-making in
drug discovery are assessed. Overall, this review provides insights into the
landscape of ML models for small molecule property predictions in drug
discovery. So far, there are multiple diverse approaches, but their
performances are often comparable. Neural networks, while more flexible, do not
always outperform simpler models. This shows that the availability of
high-quality training data remains crucial for training accurate models and
there is a need for standardized benchmarks, additional performance metrics,
and best practices to enable richer comparisons between the different
techniques and models that can shed a better light on the differences between
the many techniques.Comment: 46 pages, 1 figur
Classification and Scoring of Protein Complexes
Proteins interactions mediate all biological systems in a cell; understanding their interactions
means understanding the processes responsible for human life. Their structure can
be obtained experimentally, but such processes frequently fail at determining structures
of protein complexes. To address the issue, computational methods have been developed
that attempt to predict the structure of a protein complex, using information of its constituents.
These methods, known as docking, generate thousands of possible poses for
each complex, and require effective and reliable ways to quickly discriminate the correct
pose among the set of incorrect ones. In this thesis, a new scoring function was developed
that uses machine learning techniques and features extracted from the structure of the
interacting proteins, to correctly classify and rank the putative poses. The developed
function has shown to be competitive with current state-of-the-art solutions
HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction
Applying deep learning concepts from image detection and graph theory has
greatly advanced protein-ligand binding affinity prediction, a challenge with
enormous ramifications for both drug discovery and protein engineering. We
build upon these advances by designing a novel deep learning architecture
consisting of a 3-dimensional convolutional neural network utilizing
channel-wise attention and two graph convolutional networks utilizing
attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based
Convolutional Neural Network) obtains state-of-the-art results on the PDBbind
v.2016 core set, the most widely recognized benchmark in the field. We
extensively assess the generalizability of our model using multiple train-test
splits, each of which maximizes differences between either protein structures,
protein sequences, or ligand extended-connectivity fingerprints of complexes in
the training and test sets. Furthermore, we perform 10-fold cross-validation
with a similarity cutoff between SMILES strings of ligands in the training and
test sets, and also evaluate the performance of HAC-Net on lower-quality data.
We envision that this model can be extended to a broad range of supervised
learning problems related to structure-based biomolecular property prediction.
All of our software is available as open source at
https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is
available through PyPI
- …