1,229 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The
first is that the parameter spaces of these models exhibit pathological
curvature. Recent methods address this problem by using adaptive
preconditioning for Stochastic Gradient Descent (SGD). These methods improve
convergence by adapting to the local geometry of parameter space. A second
issue is overfitting, which is typically addressed by early stopping. However,
recent work has demonstrated that Bayesian model averaging mitigates this
problem. The posterior can be sampled by using Stochastic Gradient Langevin
Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD
methods inefficient. Here, we propose combining adaptive preconditioners with
SGLD. In support of this idea, we give theoretical properties on asymptotic
convergence and predictive risk. We also provide empirical results for Logistic
Regression, Feedforward Neural Nets, and Convolutional Neural Nets,
demonstrating that our preconditioned SGLD method gives state-of-the-art
performance on these models.Comment: AAAI 201
Confidence driven TGV fusion
We introduce a novel model for spatially varying variational data fusion,
driven by point-wise confidence values. The proposed model allows for the joint
estimation of the data and the confidence values based on the spatial coherence
of the data. We discuss the main properties of the introduced model as well as
suitable algorithms for estimating the solution of the corresponding biconvex
minimization problem and their convergence. The performance of the proposed
model is evaluated considering the problem of depth image fusion by using both
synthetic and real data from publicly available datasets
Meta Reinforcement Learning with Latent Variable Gaussian Processes
Learning from small data sets is critical in many practical applications
where data collection is time consuming or expensive, e.g., robotics, animal
experiments or drug design. Meta learning is one way to increase the data
efficiency of learning algorithms by generalizing learned concepts from a set
of training tasks to unseen, but related, tasks. Often, this relationship
between tasks is hard coded or relies in some other way on human expertise. In
this paper, we frame meta learning as a hierarchical latent variable model and
infer the relationship between tasks automatically from data. We apply our
framework in a model-based reinforcement learning setting and show that our
meta-learning model effectively generalizes to novel tasks by identifying how
new tasks relate to prior ones from minimal data. This results in up to a 60%
reduction in the average interaction time needed to solve tasks compared to
strong baselines.Comment: 11 pages, 7 figure
Bayesian field theoretic reconstruction of bond potential and bond mobility in single molecule force spectroscopy
Quantifying the forces between and within macromolecules is a necessary first
step in understanding the mechanics of molecular structure, protein folding,
and enzyme function and performance. In such macromolecular settings, dynamic
single-molecule force spectroscopy (DFS) has been used to distort bonds. The
resulting responses, in the form of rupture forces, work applied, and
trajectories of displacements, have been used to reconstruct bond potentials.
Such approaches often rely on simple parameterizations of one-dimensional bond
potentials, assumptions on equilibrium starting states, and/or large amounts of
trajectory data. Parametric approaches typically fail at inferring
complex-shaped bond potentials with multiple minima, while piecewise estimation
may not guarantee smooth results with the appropriate behavior at large
distances. Existing techniques, particularly those based on work theorems, also
do not address spatial variations in the diffusivity that may arise from
spatially inhomogeneous coupling to other degrees of freedom in the
macromolecule, thereby presenting an incomplete picture of the overall bond
dynamics. To solve these challenges, we have developed a comprehensive
empirical Bayesian approach that incorporates data and regularization terms
directly into a path integral. All experiemental and statistical parameters in
our method are estimated empirically directly from the data. Upon testing our
method on simulated data, our regularized approach requires fewer data and
allows simultaneous inference of both complex bond potentials and diffusivity
profiles.Comment: In review - Python source code available on github. Abridged abstract
on arXi
A spin glass model for reconstructing nonlinearly encrypted signals corrupted by noise
An encryption of a signal is a random mapping which can be corrupted
by an additive noise. Given the Encryption Redundancy Parameter (ERP)
, the signal strength parameter , and
the ('bare') noise-to-signal ratio (NSR) , we consider the problem
of reconstructing from its corrupted image by a Least Square Scheme
for a certain class of random Gaussian mappings. The problem is equivalent to
finding the configuration of minimal energy in a certain version of spherical
spin glass model, with squared Gaussian-distributed random potential. We use
the Parisi replica symmetry breaking scheme to evaluate the mean overlap
between the original signal and its recovered image
(known as 'estimator') as , which is a measure of the quality of
the signal reconstruction. We explicitly analyze the general case of
linear-quadratic family of random mappings and discuss the full curve. When nonlinearity exceeds a certain threshold but redundancy
is not yet too big, the replica symmetric solution is necessarily broken in
some interval of NSR. We show that encryptions with a nonvanishing linear
component permit reconstructions with for any and any
, with as . In
contrast, for the case of purely quadratic nonlinearity, for any ERP
there exists a threshold NSR value such that for
making the reconstruction impossible. The behaviour
close to the threshold is given by and
is controlled by the replica symmetry breaking mechanism.Comment: 33 pages, 5 figure
- …