6 research outputs found
On How AI Needs to Change to Advance the Science of Drug Discovery
Research around AI for Science has seen significant success since the rise of
deep learning models over the past decade, even with longstanding challenges
such as protein structure prediction. However, this fast development inevitably
made their flaws apparent -- especially in domains of reasoning where
understanding the cause-effect relationship is important. One such domain is
drug discovery, in which such understanding is required to make sense of data
otherwise plagued by spurious correlations. Said spuriousness only becomes
worse with the ongoing trend of ever-increasing amounts of data in the life
sciences and thereby restricts researchers in their ability to understand
disease biology and create better therapeutics. Therefore, to advance the
science of drug discovery with AI it is becoming necessary to formulate the key
problems in the language of causality, which allows the explication of
modelling assumptions needed for identifying true cause-effect relationships.
In this attention paper, we present causal drug discovery as the craft of
creating models that ground the process of drug discovery in causal reasoning.Comment: Main paper: 6 pages, References: 1.5 pages. Main paper: 3 figure
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?
Deep generative models for structure-based drug design (SBDD), where molecule
generation is conditioned on a 3D protein pocket, have received considerable
interest in recent years. These methods offer the promise of higher-quality
molecule generation by explicitly modelling the 3D interaction between a
potential drug and a protein receptor. However, previous work has primarily
focused on the quality of the generated molecules themselves, with limited
evaluation of the 3D molecule \emph{poses} that these methods produce, with
most work simply discarding the generated pose and only reporting a "corrected"
pose after redocking with traditional methods. Little is known about whether
generated molecules satisfy known physical constraints for binding and the
extent to which redocking alters the generated interactions. We introduce
PoseCheck, an extensive analysis of multiple state-of-the-art methods and find
that generated molecules have significantly more physical violations and fewer
key interactions compared to baselines, calling into question the implicit
assumption that providing rich 3D structure information improves molecule
complementarity. We make recommendations for future research tackling
identified failure modes and hope our benchmark can serve as a springboard for
future SBDD generative modelling work to have a real-world impact
Recommended from our members
Dynamics-Informed Protein Design with Structure Conditioning
Current protein generative models are able to design novel backbones with desired shapes or functional motifs. However, despite the importance of a protein’s dynamical properties for its function, conditioning on dynamical properties remains elusive. We present a new approach to protein generative modeling by leveraging Normal Mode Analysis that enables us to capture dynamical properties too. We introduce a method for conditioning the diffusion probabilistic models on protein dynamics, specifically on the lowest non-trivial normal mode of oscillation. Our method, similar to the classifier guidance conditioning, formulates the sampling process as being driven by conditional and unconditional terms. However, unlike previous works, we approximate the conditional term with a simple analytical function rather than an external neural network, thus making the eigenvector calculations approachable. We present the corresponding SDE theory as a formal justification of our approach. We extend our framework to conditioning on structure and dynamics at the same time, enabling scaffolding of the dynamical motifs. We demonstrate the empirical effectiveness of our method by turning the open-source unconditional protein diffusion model Genie into the conditional model with no retraining. Generated proteins exhibit the desired dynamical and structural properties while still being biologically plausible. Our work represents a first step towards incorporating dynamical behaviour in protein design and may open the door to designing more flexible and functional proteins in the future
Recommended from our members
MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery.
Acknowledgements: This work received funding from BMWi ZIM KK 5197901TS0 (T.S., F.M., G.M.P.) and BMBF, SUPREME, 031L0268 (T.S., F.M., G.M.P.). This work was supported by the Helmholtz Association’s Initiative and Networking Fund on the HAICORE@FZJ partition. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.Funder: BMWi ZIM. KK 5197901TS0Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models
Recommended from our members
Biomolecular condensate phase diagrams with a combinatorial microdroplet platform.
The assembly of biomolecules into condensates is a fundamental process underlying the organisation of the intracellular space and the regulation of many cellular functions. Mapping and characterising phase behaviour of biomolecules is essential to understand the mechanisms of condensate assembly, and to develop therapeutic strategies targeting biomolecular condensate systems. A central concept for characterising phase-separating systems is the phase diagram. Phase diagrams are typically built from numerous individual measurements sampling different parts of the parameter space. However, even when performed in microwell plate format, this process is slow, low throughput and requires significant sample consumption. To address this challenge, we present here a combinatorial droplet microfluidic platform, termed PhaseScan, for rapid and high-resolution acquisition of multidimensional biomolecular phase diagrams. Using this platform, we characterise the phase behaviour of a wide range of systems under a variety of conditions and demonstrate that this approach allows the quantitative characterisation of the effect of small molecules on biomolecular phase transitions
Recommended from our members
Biomolecular condensate phase diagrams with a combinatorial microdroplet platform.
Funder: See main manuscript file.The assembly of biomolecules into condensates is a fundamental process underlying the organisation of the intracellular space and the regulation of many cellular functions. Mapping and characterising phase behaviour of biomolecules is essential to understand the mechanisms of condensate assembly, and to develop therapeutic strategies targeting biomolecular condensate systems. A central concept for characterising phase-separating systems is the phase diagram. Phase diagrams are typically built from numerous individual measurements sampling different parts of the parameter space. However, even when performed in microwell plate format, this process is slow, low throughput and requires significant sample consumption. To address this challenge, we present here a combinatorial droplet microfluidic platform, termed PhaseScan, for rapid and high-resolution acquisition of multidimensional biomolecular phase diagrams. Using this platform, we characterise the phase behaviour of a wide range of systems under a variety of conditions and demonstrate that this approach allows the quantitative characterisation of the effect of small molecules on biomolecular phase transitions