Search CORE

2,117 research outputs found

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Quantified Uncertainty of Flexible Protein-Protein Docking Algorithms

Author: Clement Nathan L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

The strength or weakness of an algorithm is ultimately governed by the confidence of its result. When the domain of the problem is large (e.g. traversal of a high-dimensional space), an exact solution often cannot be obtained, so approximations must be made. These approximations often lead to a reported quantity of interest (QOI) which varies between runs, decreasing the confidence of any single run. When the algorithm further computes this QOI based on uncertain or noisy data, the variability (or lack of confidence) of the QOI increases. Unbounded, these two sources of uncertainty (algorithmic approximations and uncertainty in input data) can result in a reported statistic that has low correlation with ground truth. In molecular biology applications, this is especially applicable, as the search space is generally large and observations are often noisy. This research applies uncertainty quantification techniques to the difficult protein-protein docking problem, where uncertainties arise from the explicit conversion from continuous to discrete space for protein representation (introducing some uncertainty in the input data), as well as discrete sampling of the conformations. It describes the variability that exists in existing software, and then provides a method for computing probabilistic certificates in the form of Chernoff-like bounds. Finally, this paper leverages these probabilistic certificates to accurately bound the uncertainty in docking from two docking algorithms, providing a QOI that is both robust and statistically meaningful

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

SIMS: A Hybrid Method for Rapid Conformational Analysis

Author: A Altis
A Ke
A Krogh
A Shehu
AA Canutescu
AA Canutescu
AG Evdokimov
AJ Björkman
AJ Björkman
AJ Sharff
AJ Sharff
AJ Sharff
AYL Sim
B Gipson
B Kobe
B Raveh
BH Shilton
Bryant Gipson
C Knight
C Tang
CA Bewley
Chandra Verma
D Bucher
D Case
D Hsu
DA Case
DJ Mandell
DM Krüger
DP Gladue
DR Martin
DT Huang
EF Pettersen
FA Quiocho
FA Saul
FA Saul
FA Saul
GA Mueller
GF Schröder
HK Binz
I Al-Bluwi
I Botos
IA Şucan
IA Şucan
J Cortés
J Cortés
J Diez
JA Chao
JA Chao
JA Marsh
JD Brodin
JJ Song
JM Johnston
JN Onuchic
K Henzler-Wildman
K Hinsen
K Lindorff-Larsen
K Schäfer
L Skjaerven
L Tapia
LE Kavraki
LG Barrientos
Lydia E. Kavraki
M Belitsky
M Kainosho
M Levitt
M Moll
Mark Moll
ML Oldham
MS Apaydin
MT Zimmermann
N Haspel
O Guvench
O Uchime
P Das
P Minary
P Yao
PG Telmer
PW Finn
R Das
RN Gilbreth
S Adcock
S Kirillova
S Piana
S Takada
S Thomas
S Thomas
SM LaValle
SM Rubin
SS Plotkin
SW Crawley
T Chiang
T Haliloglu
T Stockner
TH Chiang
U Srinivasan
V Venkatraman
X Duan
X Duan
X Tang
Y Liu
Y Xu
Publication venue
Publication date: 01/01/2013
Field of study

Proteins are at the root of many biological functions, often performing complex tasks as the result of large changes in their structure. Describing the exact details of these conformational changes, however, remains a central challenge for computational biology due the enormous computational requirements of the problem. This has engendered the development of a rich variety of useful methods designed to answer specific questions at different levels of spatial, temporal, and energetic resolution. These methods fall largely into two classes: physically accurate, but computationally demanding methods and fast, approximate methods. We introduce here a new hybrid modeling tool, the Structured Intuitive Move Selector (SIMS), designed to bridge the divide between these two classes, while allowing the benefits of both to be seamlessly integrated into a single framework. This is achieved by applying a modern motion planning algorithm, borrowed from the field of robotics, in tandem with a well-established protein modeling library. SIMS can combine precise energy calculations with approximate or specialized conformational sampling routines to produce rapid, yet accurate, analysis of the large-scale conformational variability of protein systems. Several key advancements are shown, including the abstract use of generically defined moves (conformational sampling methods) and an expansive probabilistic conformational exploration. We present three example problems that SIMS is applied to and demonstrate a rapid solution for each. These include the automatic determination of ﾑﾑactiveﾒﾒ residues for the hinge-based system Cyanovirin-N, exploring conformational changes involving long-range coordinated motion between non-sequential residues in Ribose- Binding Protein, and the rapid discovery of a transient conformational state of Maltose-Binding Protein, previously only determined by Molecular Dynamics. For all cases we provide energetic validations using well-established energy fields, demonstrating this framework as a fast and accurate tool for the analysis of a wide range of protein flexibility problems

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

DSpace at Rice University

copulaedas: An R Package for Estimation of Distribution Algorithms Based on Copulas

Author: Gonzalez-Fernandez Yasser
Soto Marta
Publication venue
Publication date: 01/01/2014
Field of study

The use of copula-based models in EDAs (estimation of distribution algorithms) is currently an active area of research. In this context, the copulaedas package for R provides a platform where EDAs based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S4 class with generic functions for their main components. This paper presents copulaedas by providing an overview of EDAs based on copulas, a description of the implementation of the package, and an illustration of its use through examples. The examples include running the EDAs defined in the package, implementing new algorithms, and performing an empirical study to compare the behavior of different algorithms on benchmark functions and a real-world problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

Journal of Statistical Software