22,379 research outputs found
Protein secondary structure: Entropy, correlations and prediction
Is protein secondary structure primarily determined by local interactions
between residues closely spaced along the amino acid backbone, or by non-local
tertiary interactions? To answer this question we have measured the entropy
densities of primary structure and secondary structure sequences, and the local
inter-sequence mutual information density. We find that the important
inter-sequence interactions are short ranged, that correlations between
neighboring amino acids are essentially uninformative, and that only 1/4 of the
total information needed to determine the secondary structure is available from
local inter-sequence correlations. Since the remaining information must come
from non-local interactions, this observation supports the view that the
majority of most proteins fold via a cooperative process where secondary and
tertiary structure form concurrently. To provide a more direct comparison to
existing secondary structure prediction methods, we construct a simple hidden
Markov model (HMM) of the sequences. This HMM achieves a prediction accuracy
comparable to other single sequence secondary structure prediction algorithms,
and can extract almost all of the inter-sequence mutual information. This
suggests that these algorithms are almost optimal, and that we should not
expect a dramatic improvement in prediction accuracy. However, local
correlations between secondary and primary structure are probably of
under-appreciated importance in many tertiary structure prediction methods,
such as threading.Comment: 8 pages, 5 figure
Assessing Protein Conformational Sampling Methods Based on Bivariate Lag-Distributions of Backbone Angles
Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence–structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu.edu/∼madoliat/LagSVD) that can be used to produce informative animations
Recurrent oligomers in proteins - an optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies
A novel scheme is introduced to capture the spatial correlations of
consecutive amino acids in naturally occurring proteins. This knowledge-based
strategy is able to carry out optimally automated subdivisions of protein
fragments into classes of similarity. The goal is to provide the minimal set of
protein oligomers (termed ``oligons'' for brevity) that is able to represent
any other fragment. At variance with previous studies where recurrent local
motifs were classified, our concern is to provide simplified protein
representations that have been optimised for use in automated folding and/or
design attempts. In such contexts it is paramount to limit the number of
degrees of freedom per amino acid without incurring in loss of accuracy of
structural representations. The suggested method finds, by construction, the
optimal compromise between these needs. Several possible oligon lengths are
considered. It is shown that meaningful classifications cannot be done for
lengths greater than 6 or smaller than 4. Different contexts are considered
were oligons of length 5 or 6 are recommendable. With only a few dozen of
oligons of such length, virtually any protein can be reproduced within typical
experimental uncertainties. Structural data for the oligons is made publicly
available.Comment: 19 pages, 13 postscript figure
Protein Threading for Genome-Scale Structural Analysis
Protein structure prediction is a necessary tool in the field of bioinformatic analysis. It is a non-trivial process that can add a great deal of information to a genome annotation. This dissertation deals with protein structure prediction through the technique of protein fold recognition and outlines several strategies for the improvement of protein threading techniques. In order to improve protein threading performance, this dissertation begins with an outline of sequence/structure alignment energy functions. A technique called Violated Inequality Minimization is used to quickly adapt to the changing energy landscape as new energy functions are added. To continue the improvement of alignment accuracy and fold recognition, new formulations of energy functions are used for the creation of the sequence/structure alignment. These energies include a formulation of a gap penalty which is dependent on sequence characteristics different from the traditional constant penalty. Another proposed energy is dependent on conserved structural patterns found during threading. These structural patterns have been employed to refine the sequence/structure alignment in my research. The section on Linear Programming Algorithm for protein structure alignment deals with the optimization of an alignment using additional residue-pair energy functions. In the original version of the model, all cores had to be aligned to the target sequence. Our research outlines an expansion of the original threading model which allows for a more flexible alignment by allowing core deletions. Aside from improvements in fold recognition and alignment accuracy, there is also a need to ensure that these techniques can scale for the computational demands of genome level structure prediction. A heuristic decision making processes has been designed to automate the classification and preparation of proteins for prediction. A graph analysis has been applied to the integration of different tools involved in the pipeline. Analysis of the data dependency graph allows for automatic parallelization of genome structure prediction. These different contributions help to improve the overall performance of protein threading and help distribute computations across a large set of computers to help make genome scale protein structure prediction practically feasible
Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles
The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein-protein interaction prediction and design methods. © 2013 American Chemical Society
DISCRETIZED GEOMETRIC APPROACHES TO THE ANALYSIS OF PROTEIN STRUCTURES
Proteins play crucial roles in a variety of biological processes. While we know that their amino acid sequence determines their structure, which in turn determines their function, we do not know why particular sequences fold into particular structures. My work focuses on discretized geometric descriptions of protein structure—conceptualizing native structure space as composed of mostly discrete, geometrically defined fragments—to better understand the patterns underlying why particular sequence elements correspond to particular structure elements. This discretized geometric approach is applied to multiple levels of protein structure, from conceptualizing contacts between residues as interactions between discrete structural elements to treating protein structures as an assembly of discrete fragments. My earlier work focused on better understanding inter-residue contacts and estimating their energies statistically. By scoring structures with energies derived from a stricter notion of contact, I show that native protein structures can be identified out of a set of decoy structures more often than when using energies derived from traditional definitions of contact and how this has implications for the evaluation of predictions that rely on structurally defined contacts for validation. Demonstrating how useful simple geometric descriptors of structure can be, I then show that these energies identify native structures on par with well-validated, detailed, atomistic energy functions. Moving to a higher level of structure, in my later work I demonstrate that discretized, geometrically defined structural fragments make good objects for the interactive assembly of protein backbones and present a software application which lets users do so. Finally, I use these fragments to generate structure-conditioned statistical energies, generalizing the classic idea of contact energies by incorporating specific structural context, enabling these energies to reflect the interaction geometries they come from. These structure-conditioned energies contain more information about native sequence preferences, correlate more highly with experimentally determined energies, and show that pairwise sequence preferences are tightly coupled to their structural context. Considered jointly, these projects highlight the degree to which protein structures and the interactions they comprise can be understood as geometric elements coming together in finely tuned ways
Fidelity of eukaryotic and archaeal family-B DNA polymerases
PhD ThesisDNA polymerases are essential for replication, recombination and repair of DNA.
However these enzymes have multiple applications in biotechnology. Since PCR has
been developed, thermostable DNA polymerases have become important for numerous
PCR based applications. Currently these enzymes are used routinely in laboratories all
over the world.
The fidelity of DNA polymerases is a key feature for PCR. We have developed a fidelity
assay, based on a gapped plasmid template containing the lacZα gene reporter, allowing
easy and straightforward measurement of the accuracy of DNA synthesis by DNA
polymerases in vitro.
Previous studies on the family-B DNA polymerase from Pyrococcus furiosus
demonstrated that fidelity is controlled by D473, an amino acid located in the loop of the
fingers domain. It was observed that the mutation D473G had a strong error-prone
phenotype and Pfu-Pol D473G can be successfully used for random mutagenesis. To test
if eukaryotic family-B DNA polymerases use the same aspartic acid residue to control
fidelity we prepared the D799G mutant of a proteolytic fragment of polymerase epsilon
from Saccharomyces cerevisiae. Unfortunately we did not observe the expected
modulation of the fidelity of DNA synthesis for the D799G polymerase epsilon variant.
Overexpression of the multi-subunit family-B DNA polymerases from Saccharomyces
cerevisiae was found to be extremely demanding. Therefore, we decided to modify the
thermostable family-B DNA polymerase from Thermococcus gorgonarius to obtain
variants containing the loop region of the fingers domain from family-B DNA
polymerases of Saccharomyces cerevisiae. We have observed no change in DNA
synthesis accuracy when the loop region was transferred from high fidelity yeast
replicative polymerase delta. However when the loop region was transferred from
Saccharomyces cerevisiae error-prone family-B DNA polymerase zeta we observed
a strong error-prone phenotype, in some instances, loop swapping with polymerase zeta
is complicated by alignment ambiguity, so several variants were prepared.
The primary sequence alignment of the fingers domain of eukaryotic polymerases zeta
suggests no strong consensus within the loop region. Therefore, we decided to replace the
major part of the fingers domain of the family-B polymerase from Thermococcus
gorgonarius with the equivalent functional module from Saccharomyces cerevisiae
polymerase zeta. The main aim of such a rearrangement was to test if the module from
the error-prone DNA polymerase zeta has the potential to decrease fidelity. The chimeric
polymerase variant was indeed found to be a very inaccurate DNA polymerase. To our
surprise we also discovered that the polymerase variant possesses reverse transcriptase
activity. Several further modifications allowed us to significantly improve reverse
transcriptase activity of the chimeric polymerase variant
Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin
A growing number of solved protein structures display an elongated structural
domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel
alpha-helices. Alpha-rods are flexible and expose a large surface, which makes
them suitable for protein interaction. Although most likely originating by
tandem duplication of a two-helix unit, their detection using sequence
similarity between repeats is poor. Here, we show that alpha-rod repeats can be
detected using a neural network. The network detects more repeats than are
identified by domain databases using multiple profiles, with a low level of
false positives (<10%). We identify alpha-rod repeats in
approximately 0.4% of proteins in eukaryotic genomes. We then
investigate the results for all human proteins, identifying alpha-rod repeats
for the first time in six protein families, including proteins STAG1-3, SERAC1,
and PSMD1-2 & 5. We also characterize a short version of these repeats
in eight protein families of Archaeal, Bacterial, and Fungal species. Finally,
we demonstrate the utility of these predictions in directing experimental work
to demarcate three alpha-rods in huntingtin, a protein mutated in
Huntington's disease. Using yeast two hybrid analysis and an
immunoprecipitation technique, we show that the huntingtin fragments containing
alpha-rods associate with each other. This is the first definition of domains in
huntingtin and the first validation of predicted interactions between fragments
of huntingtin, which sets up directions toward functional characterization of
this protein. An implementation of the repeat detection algorithm is available
as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized
using BiasViz, a graphic tool for representation of multiple sequence
alignments
New Methods to Improve Protein Structure Modeling
Proteins are considered the central compound necessary for life, as they play a crucial role in governing several life processes by performing the most essential biological and chemical functions in every living cell. Understanding protein structures and functions will lead to a significant advance in life science and biology. Such knowledge is vital for various fields such as drug development and synthetic biofuels production.
Most proteins have definite shapes that they fold into, which are the most stable state they can adopt. Due to the fact that the protein structure information provides important insight into its functions, many research efforts have been conducted to determine the protein 3-dimensional structure from its sequence.
The experimental methods for protein 3-dimensional structure determination are often time-consuming, costly, and even not feasible for some proteins. Accordingly, recent research efforts focus more and more on computational approaches to predict protein 3-dimensional structures. Template-based modeling is considered one of the most accurate protein structure prediction methods. The success of template-based modeling relies on correctly identifying one or a few experimentally determined protein structures as structural templates that are likely to resemble the structure of the target sequence as well as accurately producing a sequence alignment that maps the residues in the target sequence to those in the template.
In this work, we aim at improving the template-based protein structure modeling by enhancing the correctness of identifying the most appropriate templates and precisely aligning the target and template sequences. Firstly, we investigate employing inter-residue contact score to measure the favorability of a target sequence fitting in the folding topology of a certain template. Secondly, we design a multi-objective alignment algorithm extending the famous Needleman-Wunsch algorithm to obtain a complete set of alignments yielding Pareto optimality. Then, we use protein sequence and structural information as objectives and generate the complete Pareto optimal front of alignments between target sequence and template. The alignments obtained enable one to analyze the trade-offs between the potentially conflicting objectives. These approaches lead to accuracy enhancement in template-based protein structure modeling
- …