10 research outputs found
Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation
Protein-ligand binding prediction is a fundamental problem in AI-driven drug
discovery. Prior work focused on supervised learning methods using a large set
of binding affinity data for small molecules, but it is hard to apply the same
strategy to other drug classes like antibodies as labelled data is limited. In
this paper, we explore unsupervised approaches and reformulate binding energy
prediction as a generative modeling task. Specifically, we train an
energy-based model on a set of unlabelled protein-ligand complexes using SE(3)
denoising score matching and interpret its log-likelihood as binding affinity.
Our key contribution is a new equivariant rotation prediction network called
Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts
a rotation by modeling the force and torque between protein and ligand atoms,
where the force is defined as the gradient of an energy function with respect
to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen
binding affinity prediction benchmarks. Our model outperforms all unsupervised
baselines (physics-based and statistical potentials) and matches supervised
learning methods in the antibody case
Versatile Energy-Based Probabilistic Models for High Energy Physics
As a classical generative modeling approach, energy-based models have the
natural advantage of flexibility in the form of the energy function. Recently,
energy-based models have achieved great success in modeling high-dimensional
data in computer vision and natural language processing. In line with these
advancements, we build a multi-purpose energy-based probabilistic model for
High Energy Physics events at the Large Hadron Collider. This framework builds
on a powerful generative model and describes higher-order inter-particle
interactions. It suits different encoding architectures and builds on implicit
generation. As for applicative aspects, it can serve as a powerful
parameterized event generator for physics simulation, a generic anomalous
signal detector free from spurious correlations, and an augmented event
classifier for particle identification.Comment: 17 pages, 9 figures. NeurIPS 2023 camera read
MIST-CF: Chemical formula inference from tandem mass spectra
Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the
first step toward structurally elucidating unknown metabolites. While great
strides have been made toward solving this problem, the current
state-of-the-art method depends on time-intensive, proprietary, and
expert-parameterized fragmentation tree construction and scoring. In this work
we extend our previous spectrum Transformer methodology into an energy based
modeling framework, MIST-CF, for learning to rank chemical formula and adduct
assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in
a data dependent fashion using a Formula Transformer neural network
architecture and circumvents the need for fragmentation tree construction. We
train and evaluate our model on a large open-access database, showing an
absolute improvement of 10% top 1 accuracy over other neural network
architectures. We further validate our approach on the CASMI2022 challenge
dataset, achieving nearly equivalent performance to the winning entry within
the positive mode category without any manual curation or post-processing of
our results. These results demonstrate an exciting strategy to more powerfully
leverage MS2 fragment peaks for predicting MS1 precursor chemical formula with
data driven learning
STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models
We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin
dYnamics, for sampling high dimensional data. With the growing efficacy and
potential of Energy-Based modeling, also known as non-normalized probabilistic
modeling, for modeling a generative process of different natures of high
dimensional data observations, we present an end-to-end learning algorithm for
Energy-Based models (EBM) with the purpose of improving the quality of the
resulting sampled data points. While the unknown normalizing constant of EBMs
makes the training procedure intractable, resorting to Markov Chain Monte Carlo
(MCMC) is in general a viable option. Realizing what MCMC entails for the EBM
training, we propose in this paper, a novel high dimensional sampling method,
based on an anisotropic stepsize and a gradient-informed covariance matrix,
embedded into a discretized Langevin diffusion. We motivate the necessity for
an anisotropic update of the negative samples in the Markov Chain by the
nonlinearity of the backbone of the EBM, here a Convolutional Neural Network.
Our resulting method, namely STANLEY, is an optimization algorithm for training
Energy-Based models via our newly introduced MCMC method. We provide a
theoretical understanding of our sampling scheme by proving that the sampler
leads to a geometrically uniformly ergodic Markov Chain. Several image
generation experiments are provided in our paper to show the effectiveness of
our method.Comment: arXiv admin note: text overlap with arXiv:1207.5938 by other author
Rethinking Attention with Performers
We introduce Performers, Transformer architectures which can estimate regular
(softmax) full-rank-attention Transformers with provable accuracy, but using
only linear (as opposed to quadratic) space and time complexity, without
relying on any priors such as sparsity or low-rankness. To approximate softmax
attention-kernels, Performers use a novel Fast Attention Via positive
Orthogonal Random features approach (FAVOR+), which may be of independent
interest for scalable kernel methods. FAVOR+ can be also used to efficiently
model kernelizable attention mechanisms beyond softmax. This representational
power is crucial to accurately compare softmax with other kernels for the first
time on large-scale tasks, beyond the reach of regular Transformers, and
investigate optimal attention-kernels. Performers are linear architectures
fully compatible with regular Transformers and with strong theoretical
guarantees: unbiased or nearly-unbiased estimation of the attention matrix,
uniform convergence and low estimation variance. We tested Performers on a rich
set of tasks stretching from pixel-prediction through text models to protein
sequence modeling. We demonstrate competitive results with other examined
efficient sparse and dense attention methods, showcasing effectiveness of the
novel attention-learning paradigm leveraged by Performers.Comment: Published as a conference paper + oral presentation at ICLR 2021. 38
pages. See
https://github.com/google-research/google-research/tree/master/protein_lm for
protein language model code, and
https://github.com/google-research/google-research/tree/master/performer for
Performer code. See
https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html
for Google AI Blo