2,907 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
The origin of large amplitude oscillations of dust particles in a plasma sheath
Micron-size charged particles can be easily levitated in low-density plasma
environments. At low pressures, suspended particles have been observed to
spontaneously oscillate around an equilibrium position. In systems of many
particles, these oscillations can catalyze a variety of nonequilibrium,
collective behaviors. Here, we report spontaneous oscillations of single
particles that remain stable for minutes with striking regularity in amplitude
and frequency. The oscillation amplitude can also exceed 1 cm, nearly an order
of magnitude larger than previously observed. Using an integrated experimental
and numerical approach, we show how the motion of an individual particle can be
used to extract the electrostatic force and equilibrium charge variation in the
plasma sheath. Additionally, using a delayed-charging model, we are able to
accurately capture the nonlinear dynamics of the particle motion, and estimate
the particle's equilibrium charging time in the plasma environment
Hierarchies of Predominantly Connected Communities
We consider communities whose vertices are predominantly connected, i.e., the
vertices in each community are stronger connected to other community members of
the same community than to vertices outside the community. Flake et al.
introduced a hierarchical clustering algorithm that finds such predominantly
connected communities of different coarseness depending on an input parameter.
We present a simple and efficient method for constructing a clustering
hierarchy according to Flake et al. that supersedes the necessity of choosing
feasible parameter values and guarantees the completeness of the resulting
hierarchy, i.e., the hierarchy contains all clusterings that can be constructed
by the original algorithm for any parameter value. However, predominantly
connected communities are not organized in a single hierarchy. Thus, we develop
a framework that, after precomputing at most maximum flows, admits a
linear time construction of a clustering \C(S) of predominantly connected
communities that contains a given community and is maximum in the sense
that any further clustering of predominantly connected communities that also
contains is hierarchically nested in \C(S). We further generalize this
construction yielding a clustering with similar properties for given
communities in time. This admits the analysis of a network's structure
with respect to various communities in different hierarchies.Comment: to appear (WADS 2013
Data-Driven Protein Engineering
Directed evolution has enabled the adaptation of natural protein sequences for an endless variety of human applications. Given a starting point - a sequence with measurable activity - directed evolution is able to improve protein sequences by iteratively accumulating beneficial mutations. However, directed evolution requires investing large experimental effort, which continues to be the major bottleneck in efficient protein optimization. To this end, we describe a framework for incorporating machine learning in the directed evolution process to maximize the utility of generated experimental data in Chapter 2. In Chapter 3, we then show that this framework outperforms traditional directed evolution methods on an empirical fitness landscape. However, directed evolution is fundamentally limited by its need for a starting point, or a sequence with measurable activity. To tackle this issue, we test the ability of nascent deep learning techniques for generating short, functional amino acid sequences in Chapter 4. Encouraged by this success, we attempted to generate full length enzymatic sequences for desired substrates without success. However, we were able to apply this deep learning approach to model other aspects of enzymatic protein sequences in Chapter 5. Finally, the field of data-driven protein sequence generation is enjoying a recent surge in interest, and we provide an updated review of protein engineering with machine learning, focusing on recent work in deep generative modeling in Chapter 1.</p
- …