Search CORE

139,420 research outputs found

Machine learning-assisted directed protein evolution with combinatorial libraries

Author: Arnold Frances H.
Kan S. B. Jennifer
Lewis Russell D.
Wittmann Bruce J.
Wu Zachary
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 30/04/2019
Field of study

To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning in the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine learning models trained on tested variants provide a fast method for testing sequence space computationally. We validate this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee. By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.Comment: Corrected best S-selective variant sequence in Figure 4. Corrected less R-selective variant sequences from Round II Input library in Table 2 and Supp Table 4. Corrections may also be found on PNAS version https://www.pnas.org/content/early/2019/12/26/192177011

arXiv.org e-Print Archive

Caltech Authors

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

Computational structure‐based drug design: Predicting target flexibility

Author: Ding X.
Dreher J.
Khago D.
Li Y.
Samuel G.
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

The role of molecular modeling in drug design has experienced a significant revamp in the last decade. The increase in computational resources and molecular models, along with software developments, is finally introducing a competitive advantage in early phases of drug discovery. Medium and small companies with strong focus on computational chemistry are being created, some of them having introduced important leads in drug design pipelines. An important source for this success is the extraordinary development of faster and more efficient techniques for describing flexibility in three‐dimensional structural molecular modeling. At different levels, from docking techniques to atomistic molecular dynamics, conformational sampling between receptor and drug results in improved predictions, such as screening enrichment, discovery of transient cavities, etc. In this review article we perform an extensive analysis of these modeling techniques, dividing them into high and low throughput, and emphasizing in their application to drug design studies. We finalize the review with a section describing our Monte Carlo method, PELE, recently highlighted as an outstanding advance in an international blind competition and industrial benchmarks.We acknowledge the BSC-CRG-IRB Joint Research Program in Computational Biology. This work was supported by a grant from the Spanish Government CTQ2016-79138-R.J.I. acknowledges support from SVP-2014-068797, awarded by the Spanish Government.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

The Nondeterministic Waiting Time Algorithm: A Review

Author: A. Funahashi
A. Funahashi
A. V. Hill
Andrei Păun
Bianca Truthe
C. M. Guldberg
C. M. Guldberg
C. V. Rao
C. Zandron
D. A. McQuarrie
D. T. Gillespie
D. T. Gillespie
E. L. Haseltine
E. S. Lander et al.
E. W. Montroll
F. Hua
G. Lahav
G. Paun
Giovanni Pighizzini
H. A. Kramers
J. Jack
J. Jack
J. Jack
J. Jack
J. M. G. Vilar
John Jack
Jürgen Dassow
K. Oda
L. M. Adleman
L. Wilhelmy
M. A. Gibson
M. Hucka et al.
N. Selliah
P. Waage
R. L. Bar-Or
S. Cheruku
Publication venue: 'Open Publishing Association'
Publication date: 01/07/2009
Field of study

We present briefly the Nondeterministic Waiting Time algorithm. Our technique for the simulation of biochemical reaction networks has the ability to mimic the Gillespie Algorithm for some networks and solutions to ordinary differential equations for other networks, depending on the rules of the system, the kinetic rates and numbers of molecules. We provide a full description of the algorithm as well as specifics on its implementation. Some results for two well-known models are reported. We have used the algorithm to explore Fas-mediated apoptosis models in cancerous and HIV-1 infected T cells

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

Author: A Akasako
A Akasako
A Cao
A Martin
A Mitraki
A Rambaut
AA Pakula
AR Dinner
AR Fersht
AR Fersht
AS Yang
AS Yang
AV Gribenko
B Steipe
B Steipe
BM Broome
C Pal
C Park
CB Anfinsen
CB Do
CM Dobson
CT Saunders
D Gilis
D Perl
D Shortle
DA Cowan
DA Drummond
DA Drummond
DD Loeb
DM Taverna
DM Taverna
E Capriotti
E Hoffmann
E van Nimwegen
EPC Rocha
Eugene I. Shakhnovich
F Chiti
F Ronquist
G Parisi
GG Brownlee
H Akashi
H Li
H Schindelin
H Zhao
H Zhou
HW Hellinga
I Keller
IE Sanchez
IMP del Pino
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Kyte
JA Wells
JB Garrett
JD Bloom
JD Bloom
JD Bloom
JD Bloom
Jesse D. Bloom
JL Thorne
JM Koshi
JP Huelsenbeck
JP Huelsenbeck
JR Cochran
JR Lepock
JV Chamary
K Ishikawa
K Ishikawa
K Katayanagi
KA Bava
KA Gray
KB Zeldovich
KJ Szretter
KL Maxwell
L Giver
L Serrano
M Dai
M Haruki
M Jacob
M Lehmann
M Matrosovich
M Ueda
M Wunderlich
Matthew J. Glassman
MD Kumar
MF Sippl
MM Garcia-Mira
MM Gromiha
MP Canadillas
MS Fornasari
MW Pantoliano
N Amin
N Goldman
N Goldman
N Lartillot
N Tong
R Godoy-Ruiz
R Godoy-Ruiz
R Godoy-Ruiz
R Guerois
R Rabadan
R Sakaue
RC Edgar
RJ Ellis
S Govindarajan
S Kimura
S Kimura
S Nakajima
S Sato
SC Choi
SH White
SJ Gamblin
SS Jaswal
U Bastolla
V Parthiban
VG Dugan
VN Uversky
W Besenmatter
WS Sandberg
WSW Wong
XJ Zhang
Y Bao
YY Tseng
Z Chen
Publication venue: International Society for Computational Biology
Publication date: 01/04/2009
Field of study

One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

The genetic basis for adaptation of model-designed syntrophic co-cultures.

Author: Brennan Caitriona
Feist Adam M
Hefner Ying
Humphrey Gregory
King Zachary A
Knight Rob
Lloyd Colton J
O'Brien Edward J
Olson Connor A
Phaneuf Patrick V
Salido Rodolfo A
Sandberg Troy E
Sanders Jon G
Sanders Karenina
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Understanding the fundamental characteristics of microbial communities could have far reaching implications for human health and applied biotechnology. Despite this, much is still unknown regarding the genetic basis and evolutionary strategies underlying the formation of viable synthetic communities. By pairing auxotrophic mutants in co-culture, it has been demonstrated that viable nascent E. coli communities can be established where the mutant strains are metabolically coupled. A novel algorithm, OptAux, was constructed to design 61 unique multi-knockout E. coli auxotrophic strains that require significant metabolite uptake to grow. These predicted knockouts included a diverse set of novel non-specific auxotrophs that result from inhibition of major biosynthetic subsystems. Three OptAux predicted non-specific auxotrophic strains-with diverse metabolic deficiencies-were co-cultured with an L-histidine auxotroph and optimized via adaptive laboratory evolution (ALE). Time-course sequencing revealed the genetic changes employed by each strain to achieve higher community growth rates and provided insight into mechanisms for adapting to the syntrophic niche. A community model of metabolism and gene expression was utilized to predict the relative community composition and fundamental characteristics of the evolved communities. This work presents new insight into the genetic strategies underlying viable nascent community formation and a cutting-edge computational method to elucidate metabolic changes that empower the creation of cooperative communities

Directory of Open Access Journals

eScholarship - University of California

Online Research Database In Technology

FigShare

Biological applications of the theory of birth-and-death processes

Author: Karev Georgy P.
Koonin Eugene V.
Novozhilov Artem S.
Publication venue
Publication date: 01/01/2005
Field of study

In this review, we discuss the applications of the theory of birth-and-death processes to problems in biology, primarily, those of evolutionary genomics. The mathematical principles of the theory of these processes are briefly described. Birth-and-death processes, with some straightforward additions such as innovation, are a simple, natural formal framework for modeling a vast variety of biological processes such as population dynamics, speciation, genome evolution, including growth of paralogous gene families and horizontal gene transfer, and somatic evolution of cancers. We further describe how empirical data, e.g., distributions of paralogous gene family size, can be used to choose the model that best reflects the actual course of evolution among different versions of birth-death-and-innovation models. It is concluded that birth-and-death processes, thanks to their mathematical transparency, flexibility and relevance to fundamental biological process, are going to be an indispensable mathematical tool for the burgeoning field of systems biology.Comment: 29 pages, 4 figures; submitted to "Briefings in Bioinformatics

arXiv.org e-Print Archive

CiteSeerX