1,595 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Transferable neural networks for enhanced sampling of protein dynamics
Variational auto-encoder frameworks have demonstrated success in reducing
complex nonlinear dynamics in molecular simulation to a single non-linear
embedding. In this work, we illustrate how this non-linear latent embedding can
be used as a collective variable for enhanced sampling, and present a simple
modification that allows us to rapidly perform sampling in multiple related
systems. We first demonstrate our method is able to describe the effects of
force field changes in capped alanine dipeptide after learning a model using
AMBER99. We further provide a simple extension to variational dynamics encoders
that allows the model to be trained in a more efficient manner on larger
systems by encoding the outputs of a linear transformation using time-structure
based independent component analysis (tICA). Using this technique, we show how
such a model trained for one protein, the WW domain, can efficiently be
transferred to perform enhanced sampling on a related mutant protein, the GTT
mutation. This method shows promise for its ability to rapidly sample related
systems using a single transferable collective variable and is generally
applicable to sets of related simulations, enabling us to probe the effects of
variation in increasingly large systems of biophysical interest.Comment: 20 pages, 10 figure
Learning a Hybrid Architecture for Sequence Regression and Annotation
When learning a hidden Markov model (HMM), sequen- tial observations can
often be complemented by real-valued summary response variables generated from
the path of hid- den states. Such settings arise in numerous domains, includ-
ing many applications in biology, like motif discovery and genome annotation.
In this paper, we present a flexible frame- work for jointly modeling both
latent sequence features and the functional mapping that relates the summary
response variables to the hidden state sequence. The algorithm is com- patible
with a rich set of mapping functions. Results show that the availability of
additional continuous response vari- ables can simultaneously improve the
annotation of the se- quential observations and yield good prediction
performance in both synthetic data and real-world datasets.Comment: AAAI 201
- …