2,117 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Quantified Uncertainty of Flexible Protein-Protein Docking Algorithms
The strength or weakness of an algorithm is ultimately governed by the confidence of its result. When the domain of the problem is large (e.g. traversal of a high-dimensional space), an exact solution often cannot be obtained, so approximations must be made. These approximations often lead to a reported quantity of interest (QOI) which varies between runs, decreasing the confidence of any single run. When the algorithm further computes this QOI based on uncertain or noisy data, the variability (or lack of confidence) of the QOI increases. Unbounded, these two sources of uncertainty (algorithmic approximations and uncertainty in input data) can result in a reported statistic that has low correlation with ground truth.
In molecular biology applications, this is especially applicable, as the search space is generally large and observations are often noisy. This research applies uncertainty quantification techniques to the difficult protein-protein docking problem, where uncertainties arise from the explicit conversion from continuous to discrete space for protein representation (introducing some uncertainty in the input data), as well as discrete sampling of the conformations. It describes the variability that exists in existing software, and then provides a method for computing probabilistic certificates in the form of Chernoff-like bounds. Finally, this paper leverages these probabilistic certificates to accurately bound the uncertainty in docking from two docking algorithms, providing a QOI that is both robust and statistically meaningful
SIMS: A Hybrid Method for Rapid Conformational Analysis
Proteins are at the root of many biological functions, often performing complex tasks as the result of large changes in their
structure. Describing the exact details of these conformational changes, however, remains a central challenge for
computational biology due the enormous computational requirements of the problem. This has engendered the
development of a rich variety of useful methods designed to answer specific questions at different levels of spatial,
temporal, and energetic resolution. These methods fall largely into two classes: physically accurate, but computationally
demanding methods and fast, approximate methods. We introduce here a new hybrid modeling tool, the Structured
Intuitive Move Selector (SIMS), designed to bridge the divide between these two classes, while allowing the benefits of both
to be seamlessly integrated into a single framework. This is achieved by applying a modern motion planning algorithm,
borrowed from the field of robotics, in tandem with a well-established protein modeling library. SIMS can combine precise
energy calculations with approximate or specialized conformational sampling routines to produce rapid, yet accurate,
analysis of the large-scale conformational variability of protein systems. Several key advancements are shown, including the
abstract use of generically defined moves (conformational sampling methods) and an expansive probabilistic
conformational exploration. We present three example problems that SIMS is applied to and demonstrate a rapid solution
for each. These include the automatic determination of ムムactiveメメ residues for the hinge-based system Cyanovirin-N,
exploring conformational changes involving long-range coordinated motion between non-sequential residues in Ribose-
Binding Protein, and the rapid discovery of a transient conformational state of Maltose-Binding Protein, previously only
determined by Molecular Dynamics. For all cases we provide energetic validations using well-established energy fields,
demonstrating this framework as a fast and accurate tool for the analysis of a wide range of protein flexibility problems
copulaedas: An R Package for Estimation of Distribution Algorithms Based on Copulas
The use of copula-based models in EDAs (estimation of distribution
algorithms) is currently an active area of research. In this context, the
copulaedas package for R provides a platform where EDAs based on copulas can be
implemented and studied. The package offers complete implementations of various
EDAs based on copulas and vines, a group of well-known optimization problems,
and utility functions to study the performance of the algorithms. Newly
developed EDAs can be easily integrated into the package by extending an S4
class with generic functions for their main components. This paper presents
copulaedas by providing an overview of EDAs based on copulas, a description of
the implementation of the package, and an illustration of its use through
examples. The examples include running the EDAs defined in the package,
implementing new algorithms, and performing an empirical study to compare the
behavior of different algorithms on benchmark functions and a real-world
problem
- …