64,226 research outputs found
Learning effective amino acid interactions through iterative stochastic techniques
The prediction of the three-dimensional structures of the native state of
proteins from the sequences of their amino acids is one of the most important
challenges in molecular biology. An essential ingredient to solve this problem
within coarse-grained models is the task of deducing effective interaction
potentials between the amino acids. Over the years several techniques have been
developed to extract potentials that are able to discriminate satisfactorily
between the native and non-native folds of a pre-assigned protein sequence. In
general, when these potentials are used in actual dynamical folding
simulations, they lead to a drift of the native structure outside the
quasi-native basin. In this study, we present and validate an approach to
overcome this difficulty. By exploiting several numerical and analytical tools
we set up a rigorous iterative scheme to extract potentials satisfying a
pre-requisite of any viable potential: the stabilization of proteins within
their native basin (less than 3-4 \AA cRMS). The scheme is flexible and is
demonstrated to be applicable to a variety of parametrizations of the energy
function and provides, in each case, the optimal potentials.Comment: Revtex 17 pages, 10 eps figures. Proteins: Structure, Function and
Genetics (in press
Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors
An effective potential function is critical for protein structure prediction
and folding simulation. Simplified protein models such as those requiring only
or backbone atoms are attractive because they enable efficient
search of the conformational space. We show residue specific reduced discrete
state models can represent the backbone conformations of proteins with small
RMSD values. However, no potential functions exist that are designed for such
simplified protein models. In this study, we develop optimal potential
functions by combining contact interaction descriptors and local
sequence-structure descriptors. The form of the potential function is a
weighted linear sum of all descriptors, and the optimal weight coefficients are
obtained through optimization using both native and decoy structures. The
performance of the potential function in test of discriminating native protein
structures from decoys is evaluated using several benchmark decoy sets. Our
potential function requiring only backbone atoms or atoms have
comparable or better performance than several residue-based potential functions
that require additional coordinates of side chain centers or coordinates of all
side chain atoms. By reducing the residue alphabets down to size 5 for local
structure-sequence relationship, the performance of the potential function can
be further improved. Our results also suggest that local sequence-structure
correlation may play important role in reducing the entropic cost of protein
folding.Comment: 20 pages, 5 figures, 4 tables. In press, Protein
AVIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca for Predicting Antigen-Antibody Interactions
Antibodies have become an important class of therapeutic agents to treat
human diseases. To accelerate therapeutic antibody discovery, computational
methods, especially machine learning, have attracted considerable interest for
predicting specific interactions between antibody candidates and target
antigens such as viruses and bacteria. However, the publicly available datasets
in existing works have notable limitations, such as small sizes and the lack of
non-binding samples and exact amino acid sequences. To overcome these
limitations, we have developed AVIDa-hIL6, a large-scale dataset for predicting
antigen-antibody interactions in the variable domain of heavy chain of heavy
chain antibodies (VHHs), produced from an alpaca immunized with the human
interleukin-6 (IL-6) protein, as antigens. By leveraging the simple structure
of VHHs, which facilitates identification of full-length amino acid sequences
by DNA sequencing technology, AVIDa-hIL6 contains 573,891 antigen-VHH pairs
with amino acid sequences. All the antigen-VHH pairs have reliable labels for
binding or non-binding, as generated by a novel labeling method. Furthermore,
via introduction of artificial mutations, AVIDa-hIL6 contains 30 different
mutants in addition to wild-type IL-6 protein. This characteristic provides
opportunities to develop machine learning models for predicting changes in
antibody binding by antigen mutations. We report experimental benchmark results
on AVIDa-hIL6 by using neural network-based baseline models. The results
indicate that the existing models have potential, but further research is
needed to generalize them to predict effective antibodies against unknown
mutants. The dataset is available at https://avida-hil6.cognanous.com
Neural Network and Bioinformatic Methods for Predicting HIV-1 Protease Inhibitor Resistance
This article presents a new method for predicting viral resistance to seven protease inhibitors from the HIV-1 genotype, and for identifying the positions in the protease gene at which the specific nature of the mutation affects resistance. The neural network Analog ARTMAP predicts protease inhibitor resistance from viral genotypes. A feature selection method detects genetic positions that contribute to resistance both alone and through interactions with other positions. This method has identified positions 35, 37, 62, and 77, where traditional feature selection methods have not detected a contribution to resistance.
At several positions in the protease gene, mutations confer differing degress of resistance, depending on the specific amino acid to which the sequence has mutated. To find these positions, an Amino Acid Space is introduced to represent genes in a vector space that captures the functional similarity between amino acid pairs. Feature selection identifies several new positions, including 36, 37, and 43, with amino acid-specific contributions to resistance. Analog ARTMAP networks applied to inputs that represent specific amino acids at these positions perform better than networks that use only mutation locations.Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Recommended from our members
Site-Directed Mutagenesis and Site-Specific Binding Analysis of Calmodulin (CaM)
Calcium signaling is a major regulatory system in cells and a crucial part of cell biology. An
important element in the decoding of intracellular calcium concentration into downstream
processes is the ubiquitous and highly conserved calcium binding protein calmodulin (CaM)
which can bind to and modulate the function of hundreds of different target proteins,
regulating such processes as synaptic plasticity, gene expression and electrical signaling. The
biophysical characterization of binding affinity and cooperative interactions between each of
calmodulin’s four EF-hand calcium binding sites is essential for understanding calcium
signaling. Highly conserved amino acid sequence differences in the ion binding loops of the
EF-hands give each site unique affinity for calcium. EF-hands are almost always found in
pairs, where binding to one of the sites affects the affinity of the paired site. We have used
spectroscopy to measure site-specific binding in each of the paired binding sites in the CaM
N-lobe, along with site-directed mutagenesis, to study the contributions of individual amino
acids to the ion binding affinity in the mutated site (cis effects) and in the neighboring site
(trans effects). Of the twelve amino acids in the binding loops, five are different between Site
1 and Site 2. We constructed proteins with substituted individual residues from Site 1 to Site
2. CaM with the full Site 1 sequence in both Site 1 and Site 2 shows significant changes in
affinity and binding characteristics in both sites. To investigate the contributions of the
individual amino acid differences, we made intermediate mutants containing individual amino
acid changes in Site 2. The cis-effects of the intermediate mutations on the mutated site, Site
2, seem to be independent and additive, whereas the trans-effects on the non-mutated Site 1
showed unexpected dependence on combinations of amino acid changes in Site 2.Neuroscienc
- …