Search CORE

18,669 research outputs found

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions

Author: A Sarai
AV Morozov
BW Matthews
BW Matthews
CG Kalodimos
CH Yan
CO Pabo
E Fraenkel
E Katchalski-Katzir
FK Winkler
H Tjong
I Bonnet
IB Kuznetsov
Ilya Vakser
J Gorman
J Skolnick
J Skolnick
JE Donald
Jeffrey Skolnick
JJ Havranek
JS Lamoureux
JS Lamoureux
M Billeter
M Gao
M van Dijk
MJ Sippl
Mu Gao
N Bhardwaj
NC Horton
NM Luscombe
NP Stanford
O Givaty
P Aloy
P Rotkiewicz
PH von Hippel
R Mendez
R Samudrala
RMA Knegtel
S Ahmad
S Jones
SE Halford
SJ Hubbard
TA Robertson
TW Siggers
W Humphrey
WJ Lane
XJ Lu
Y Zhang
Y Zhang
ZJ Liu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2009
Field of study

©2009 Gao, Skolnick. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.doi:10.1371/journal.pcbi.1000341DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Ca deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

Author: Cang Zixuan
Wei Guo-Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/03/2017
Field of study

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Predicting variation of DNA shape preferences in protein-DNA interaction in cancer cells with a new biophysical model

Author: Batmanov Kirill
Wang Junbai
Publication venue: 'MDPI AG'
Publication date: 01/09/2017
Field of study

DNA shape readout is an important mechanism of target site recognition by transcription factors, in addition to the sequence readout. Several models of transcription factor-DNA binding which consider DNA shape have been developed in recent years. We present a new biophysical model of protein-DNA interaction by considering the DNA shape features, which is based on a neighbour dinucleotide dependency model BayesPI2. The parameters of the new model are restricted to a subspace spanned by the 2-mer DNA shape features, which allowing a biophysical interpretation of the new parameters as position-dependent preferences towards certain values of the features. Using the new model, we explore the variation of DNA shape preferences in several transcription factors across cancer cell lines and cellular conditions. We find evidence of DNA shape variations at FOXA1 binding sites in MCF7 cells after treatment with steroids. The new model is useful for elucidating finer details of transcription factor-DNA interaction. It may be used to improve the prediction of cancer mutation effects in the future

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

Author: Abel R
Achenbach J
Adikwu UM
Ain QU
Al-Yamori R
Alhalabi Z
Aniceto N
Ansideri F
Baker D
Balducci A
Banting L
Barilla J
Barrett I
Basu D
Baumann K
Bender A
Bender A
Bender A
Berg E
Bergström F
Bermudez M
Bietz S
Bietz S
Bodnarchuk MS
Boeckler FM
Boeckler FM
Bojarski AJ
Bojarski AJ
Borbulevych OY
Buchholz M
Bulusu KC
Bureau R
Böckler FM
Böttcher S
Büttner FM
Cao Q
Cappel D
Cheeseright T
Clark RD
Clark T
Da Costa FB
Dahlgren M
De Graaf C
Demuth H-U
Dorfman R
Dubrucq K
Ecker GF
Edman K
Egelkraut-Holtus M
Eid S
Eigner-Pitto V
Engel J
Engkvist O
Epple M
Essex JW
Evers A
Exner TE
Fan T-P
Fechner U
Finkelmann AR
Firaha DS
Firth M
Fourches D
Fraaije JH
Frach R
Frach R
Fraczkiewicz R
Freitas A
Friedrich N-O
Friesner R
Fu X
Fuchs JE
Fulle S
Furtado F
Garg P
Gervasio FL
Ghafourian T
Glen R
Gracia RS
Grebner C
Guallar V
Göller AH
Günther MB
Günther S
Güssregen S
Haensele E
Heidrich J
Heil J
Hennig S
Herrmann G
Hessler G
Hilbig M
Himmler H-J
Hoffgaard F
Hogner A
Hollóczki O
Horinek D
Hošek P
Husch T
Ibezim A
Ihlenfeldt WD
Ihlenfeldt WD
Jardin C
Judson P
Jäger C
Kalinowski L
Kalliokoski T
Kast SM
Kast SM
Kast SM
Kibies P
Kibies P
Kirchmair J
Kirchner B
Kireeva N
Klute W
Koch O
Koch P
Kohlbacher O
Kolb P
Korth M
Kos A
Kramer C
Krilov G
Krotzky T
Krotzky T
Kuhn H
Kuhn MA
Kurczab R
Kühne R
Lange A
Lange A
Lanig H
Laufer S
Levine Z
Li X
Lifongo LL
Lin T
Lisurek M
Lokajíček MV
Mackey M
Masek BB
Mathea M
Matter H
Mbah CJ
Mbaze LM
McWilliams L
Mervin L
Mervin LH
Mittal S
Mohamad-Zobir SZ
Montanari F
Moser D
Mrugalla F
Mullen R
Murray DC
Nagy S
Nahum O
Naß A
Nguyen QD
Nogueira MS
Ntie-Kang F
Ntie-Kang F
Ntie-Kang F
Nwodo NJ
Oliveira Santos JS-D
Oliveira TB
Omoto K
Onlia I
Ostroumov D
Owen RM
Panecka J
Patel H
Pervov VS
Petrov A
Pisaková H
Pleik S
Polokoff M
Pongratz T
Pretzel J
Proschak E
Pryde DC
Pöhner IA
Rarey M
Rarey M
Rarey M
Rauh D
Renner G
Renner G
Richmond NJ
Rickmeyer T
Rippmann F
Ross GA
Ruff M
Rupp B
Saladino G
Saleh N
Sandmann A
Sandmann A
Schall C
Schmidt D
Schmidt TC
Schmidt TJ
Schmidtke P
Schneider G
Schomburg KT
Schram J
Schulz R
Schütter C
Segler MHS
Senderowitz H
Shaikh N
Shea J-E
Sherman W
Sievers-Engler A
Simoben CV
Simr P
Sippl W
Smith S
Solovev VP
Soltanshahi F
Sommer K
Sotriffer CA
Spiwok V
Stehle T
Steinbrecher TB
Steudle A
Sticht H
Strohfeldt S
Sánchez-García E
Tautermann CS
Torda AE
Torella R
Truszkowski A
Turk S
Tyrchan C
Tyrchan C
Ulander J
Ulander J
Van den Broek K
Van den Broek K
Van Oeyen A
Volkamer A
Wade RC
Waldman M
Waller MP
Wang L
Warszycki D
Weber J
Wessjohann L
Westerhoff LM
Whitley DC
Wieczorek V
Wolber G
Yosipof A
Zdrazil B
Zielesny A
Zimmermann MO
Zoufir A
Śmieja M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/03/2016
Field of study

Spiral - Imperial College Digital Repository