Search CORE

2,218 research outputs found

Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors

Author: Adamian
Anfinsen
Avbelj
Bahar
Bastolla
Betancourt
Brooks
Buchete
Cannata
Chiu
Cline
Crasto
Dill
Dill
Dima
Dobson
Fain
Fitzkee
Fletcher
Friedrichs
Gan
Goldstein
Guntert
Hao
Head-Gordon
Hou
Hu
Hunter
Joachims
Kolinski
Kolodny
Kuang
Lazaridis
Levinthal
Levitt
Lezon
Li
Li
Liang
Loose
Lu
Maiorov
McConkey
McGuffin
Mirny
Miyazawa
Murphy
Park
Park
Pearlman
Pei
Przytycka
Riddle
Sagot
Samudrala
Samudrala
Schölkopf
Shortle
Shortle
Simons
Simons
Simons
Thomas
Tobi
Tobi
Tsai
Vendruscolo
Vendruscolo
Vriend
Wang
Wang
Xia
Xia
Zhang
Zhang
Zhang
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only

C_\alpha

or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue specific reduced discrete state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or

C_\alpha

atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side chain centers or coordinates of all side chain atoms. By reducing the residue alphabets down to size 5 for local structure-sequence relationship, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding.Comment: 20 pages, 5 figures, 4 tables. In press, Protein

arXiv.org e-Print Archive

Crossref

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author: Cook Cory
Publication venue: SJSU ScholarWorks
Publication date: 14/06/2016
Field of study

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

SJSU ScholarWorks

Modeling study on the validity of a possibly simplified representation of proteins

Author: A. M. Ferrenberg
A. R. Dinner
A. Sali
C. K. Mathews
D. K. Klimov
D. S. Riddle
D. Thirumalai
E. I. Shakhnovich
E. I. Shakhnovich
E. I. Shakhnovich
E. I. Shakhnovich
H. Li
H. Li
H. S. Chan
H. S. Chan
Jun Wang
Jun Wang
K. A. Dill
K. F. Lau
K. W. Plaxco
K. Yue
L. Regan
M. Vendruscolo
N. D. Socci
N. Socci
P. G. Wolynes
P. G. Wolynes
R. Elber
S. Kamtekar
S. Miyazawa
T. E. Creighton
T. Veitshans
V. Pande
Wei Wang
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2000
Field of study

The folding characteristics of sequences reduced with a possibly simplified representation of five types of residues are shown to be similar to their original ones with the natural set of residues (20 types or 20 letters). The reduced sequences have a good foldability and fold to the same native structure of their optimized original ones. A large ground state gap for the native structure shows the thermodynamic stability of the reduced sequences. The general validity of such a five-letter reduction is further studied via the correlation between the reduced sequences and the original ones. As a comparison, a reduction with two letters is found not to reproduce the native structure of the original sequences due to its homopolymeric features.Comment: 6 pages with 4 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

From Theory to Practice: Plug and Play with Succinct Data Structures

Author: F. Claude
G. Navarro
G. Navarro
J.S. Culpepper
K. Sadakane
K. Sadakane
N. Jesper Larsson
R. Grossi
S. Vigna
V. Mäkinen
Publication venue
Publication date: 05/11/2013
Field of study

Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older base- line implementations may not rely on the same basic components, and reimplementing from scratch can be very time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing succinct solutions for document retrieval.Comment: 10 pages, 4 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Paradigms for computational nucleic acid design

Author: Dirks Robert M.
Lin Milo
Pierce Niles A.
Winfree Erik
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2004
Field of study

The design of DNA and RNA sequences is critical for many endeavors, from DNA nanotechnology, to PCR‐based applications, to DNA hybridization arrays. Results in the literature rely on a wide variety of design criteria adapted to the particular requirements of each application. Using an extensively studied thermodynamic model, we perform a detailed study of several criteria for designing sequences intended to adopt a target secondary structure. We conclude that superior design methods should explicitly implement both a positive design paradigm (optimize affinity for the target structure) and a negative design paradigm (optimize specificity for the target structure). The commonly used approaches of sequence symmetry minimization and minimum free‐energy satisfaction primarily implement negative design and can be strengthened by introducing a positive design component. Surprisingly, our findings hold for a wide range of secondary structures and are robust to modest perturbation of the thermodynamic parameters used for evaluating sequence quality, suggesting the feasibility and ongoing utility of a unified approach to nucleic acid design as parameter sets are refined further. Finally, we observe that designing for thermodynamic stability does not determine folding kinetics, emphasizing the opportunity for extending design criteria to target kinetic features of the energy landscape

CiteSeerX

Caltech Authors

Hybrid modeling, HMM/NN architectures, and protein applications

Author: Baldi Pierre
Chauvin Yves
Publication venue: 'MIT Press - Journals'
Publication date: 01/10/1996
Field of study

We describe a hybrid modeling approach where the parameters of a model are calculated and modulated by another model, typically a neural network (NN), to avoid both overfitting and underfitting. We develop the approach for the case of Hidden Markov Models (HMMs), by deriving a class of hybrid HMM/NN architectures. These architectures can be trained with unified algorithms that blend HMM dynamic programming with NN backpropagation. In the case of complex data, mixtures of HMMs or modulated HMMs must be used. NNs can then be applied both to the parameters of each single HMM, and to the switching or modulation of the models, as a function of input or context. Hybrid HMM/NN architectures provide a flexible NN parameterization for the control of model structure and complexity. At the same time, they can capture distributions that, in practice, are inaccessible to single HMMs. The HMM/NN hybrid approach is tested, in its simplest form, by constructing a model of the immunoglobulin protein family. A hybrid model is trained, and a multiple alignment derived, with less than a fourth of the number of parameters used with previous single HMMs

Caltech Authors