2,002 research outputs found
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Enrichment of Chemical Libraries Docked to Protein Conformational Ensembles and Application to Aldehyde Dehydrogenase 2
Molecular recognition is a complex
process that involves a large
ensemble of structures of the receptor and ligand. Yet, most structure-based
virtual screening is carried out on a single structure typically from
X-ray crystallography. Explicit-solvent molecular dynamics (MD) simulations
offer an opportunity to sample multiple conformational states of a
protein. Here we evaluate our recently developed scoring method SVMSP
in its ability to enrich chemical libraries docked to MD structures
of seven proteins from the Directory of Useful Decoys (DUD). SVMSP
is a target-specific rescoring method that combines machine learning
with statistical potentials. We find that enrichment power as measured
by the area under the ROC curve (ROC-AUC) is not affected by increasing
the number of MD structures. Among individual MD snapshots, many exhibited
enrichment that was significantly better than the crystal structure,
but no correlation between enrichment and structural deviation from
crystal structure was found. We followed an innovative approach by
training SVMSP scoring models using MD structures (SVMSPMD). The resulting models were applied to two difficult cases (p38
and CDK2) for which enrichment was not better than random. We found
remarkable increase in enrichment power, particularly for p38, where
the ROC-AUC increased by 0.30 to 0.85. Finally, we explored approaches
for a priori identification of MD snapshots
with high enrichment power from an MD simulation in the absence of
active compounds. We found that the use of randomly selected compounds
docked to the target of interest using SVMSP led to notable enrichment
for EGFR and Src MD snapshots. SVMSP rescoring of protein–compound
MD structures was applied for the search of small-molecule inhibitors
of the mitochondrial enzyme aldehyde dehydrogenase 2 (ALDH2). Rank-ordering
of a commercial library of 50 000 compounds docked to MD structures
of ALDH2 led to five small-molecule inhibitors. Four compounds had
IC50s below 5 μM. These compounds serve as leads for the design
and synthesis of more potent and selective ALDH2 inhibitors
- …