1,387,356 research outputs found
Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available.ImportanceTo fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available
EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments
BACKGROUND: Structure-dependent substitution matrices increase the accuracy of sequence alignments when the 3D structure of one sequence is known, and are successful e.g. in fold recognition. We propose a new automated method, EvDTree, based on a decision tree algorithm, for automatic derivation of amino acid substitution probabilities from a set of sequence-structure alignments. The main advantage over other approaches is an unbiased automatic selection of the most informative structural descriptors and associated values or thresholds. This feature allows automatic derivation of structure-dependent substitution scores for any specific set of structures, without the need to empirically determine best descriptors and parameters. RESULTS: Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors. CONCLUSIONS: We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches
Structure of the RNA-dependent RNA polymerase of poliovirus
AbstractBackground: The central player in the replication of RNA viruses is the viral RNA-dependent RNA polymerase. The 53 kDa poliovirus polymerase, together with other viral and possibly host proteins, carries out viral RNA replication in the host cell cytoplasm. RNA-dependent RNA polymerases comprise a distinct category of polymerases that have limited sequence similarity to reverse transcriptases (RNA-dependent DNA polymerases) and perhaps also to DNA-dependent polymerases. Previously reported structures of RNA-dependent DNA polymerases, DNA-dependent DNA polymerases and a DNA-dependent RNA polymerase show that structural and evolutionary relationships exist between the different polymerase categories.Results: We have determined the structure of the RNA-dependent RNA polymerase of poliovirus at 2.6 Å resolution by X-ray crystallography. It has the same overall shape as other polymerases, commonly described by analogy to a right hand. The structures of the ‘fingers’ and ‘thumb’ subdomains of poliovirus polymerase differ from those of other polymerases, but the palm subdomain contains a core structure very similar to that of other polymerases. This conserved core structure is composed of four of the amino acid sequence motifs described for RNA-dependent polymerases. Structure-based alignments of these motifs has enabled us to modify and extend previous sequence and structural alignments so as to relate sequence conservation to function. Extensive regions of polymerase–polymerase interactions observed in the crystals suggest an unusual higher order structure that we believe is important for polymerase function.Conclusions: As a first example of a structure of an RNA-dependent RNA polymerase, the poliovirus polymerase structure provides for a better understanding of polymerase structure, function and evolution. In addition, it has yielded insights into an unusual higher order structure that may be critical for poliovirus polymerase function
On the optimal contact potential of proteins
We analytically derive the lower bound of the total conformational energy of
a protein structure by assuming that the total conformational energy is well
approximated by the sum of sequence-dependent pairwise contact energies. The
condition for the native structure achieving the lower bound leads to the
contact energy matrix that is a scalar multiple of the native contact matrix,
i.e., the so-called Go potential. We also derive spectral relations between
contact matrix and energy matrix, and approximations related to one-dimensional
protein structures. Implications for protein structure prediction are
discussed.Comment: 5 pages, text onl
Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences
Questions in computational molecular biology generate various discrete
optimization problems, such as DNA sequence alignment and RNA secondary
structure prediction. However, the optimal solutions are fundamentally
dependent on the parameters used in the objective functions. The goal of a
parametric analysis is to elucidate such dependencies, especially as they
pertain to the accuracy and robustness of the optimal solutions. Techniques
from geometric combinatorics, including polytopes and their normal fans, have
been used previously to give parametric analyses of simple models for DNA
sequence alignment and RNA branching configurations. Here, we present a new
computational framework, and proof-of-principle results, which give the first
complete parametric analysis of the branching portion of the nearest neighbor
thermodynamic model for secondary structure prediction for real RNA sequences.Comment: 17 pages, 8 figure
Sequence-dependent thermodynamics of a coarse-grained DNA model
We introduce a sequence-dependent parametrization for a coarse-grained DNA
model [T. E. Ouldridge, A. A. Louis, and J. P. K. Doye, J. Chem. Phys. 134,
085101 (2011)] originally designed to reproduce the properties of DNA molecules
with average sequences. The new parametrization introduces sequence-dependent
stacking and base-pairing interaction strengths chosen to reproduce the melting
temperatures of short duplexes. By developing a histogram reweighting
technique, we are able to fit our parameters to the melting temperatures of
thousands of sequences. To demonstrate the flexibility of the model, we study
the effects of sequence on: (a) the heterogeneous stacking transition of single
strands, (b) the tendency of a duplex to fray at its melting point, (c) the
effects of stacking strength in the loop on the melting temperature of
hairpins, (d) the force-extension properties of single strands and (e) the
structure of a kissing-loop complex. Where possible we compare our results with
experimental data and find a good agreement. A simulation code called oxDNA,
implementing our model, is available as free software.Comment: 15 page
Emergence of stable and fast folding protein structures
The number of protein structures is far less than the number of sequences. By
imposing simple generic features of proteins (low energy and compaction) on all
possible sequences we show that the structure space is sparse compared to the
sequence space. Even though the sequence space grows exponentially with N (the
number of amino acids) we conjecture that the number of low energy compact
structures only scales as ln N. This implies that many sequences must map onto
countable number of basins in the structure space. The number of sequences for
which a given fold emerges as a native structure is further reduced by the dual
requirements of stability and kinetic accessibility. The factor that determines
the dual requirement is related to the sequence dependent temperatures,
T_\theta (collapse transition temperature) and T_F (folding transition
temperature). Sequences, for which \sigma =(T_\theta-T_F)/T_\theta is small,
typically fold fast by generically collapsing to the native-like structures and
then rapidly assembling to the native state. Such sequences satisfy the dual
requirements over a wide temperature range. We also suggest that the functional
requirement may further reduce the number of sequences that are biologically
competent. The scheme developed here for thinning of the sequence space that
leads to foldable structures arises naturally using simple physical
characteristics of proteins. The reduction in sequence space leading to the
emergence of foldable structures is demonstrated using lattice models of
proteins.Comment: latex, 18 pages, 8 figures, to be published in the conference
proceedings "Stochastic Dynamics and Pattern Formation in Biological Systems
The integrated periodogram of a dependent extremal event sequence
We investigate the asymptotic properties of the integrated periodogram
calculated from a sequence of indicator functions of dependent extremal events.
An event in Euclidean space is extreme if it occurs far away from the origin.
We use a regular variation condition on the underlying stationary sequence to
make these notions precise. Our main result is a functional central limit
theorem for the integrated periodogram of the indicator functions of dependent
extremal events. The limiting process is a continuous Gaussian process whose
covari- ance structure is in general unfamiliar, but in the iid case a Brownian
bridge appears. In the general case, we propose a stationary bootstrap
procedure for approximating the distribution of the limiting process. The
developed theory can be used to construct classical goodness-of-fit tests such
as the Grenander- Rosenblatt and Cram\'{e}r-von Mises tests which are based
only on the extremes in the sample. We apply the test statistics to simulated
and real-life data
The Sequence Dependent Nanoscale Structure of CENP-A Nucleosomes
CENP-A is a histone variant found in high abundance at the centromere in humans. At the centromere, this histone variant replaces the histone H3 found throughout the bulk chromatin. Additionally, the centromere comprises tandem repeats of α-satellite DNA, which CENP-A nucleosomes assemble upon. However, the effect of the DNA sequence on the nucleosome assembly and centromere formation remains poorly understood. Here, we investigated the structure of nucleosomes assembled with the CENP-A variant using Atomic Force Microscopy. We assembled both CENP-A nucleosomes and H3 nucleosomes on a DNA substrate containing an α-satellite motif and characterized their positioning and wrapping efficiency. We also studied CENP-A nucleosomes on the 601-positioning motif and non-specific DNA to compare their relative positioning and stability. CENP-A nucleosomes assembled on α-satellite DNA did not show any positional preference along the substrate, which is similar to both H3 nucleosomes and CENP-A nucleosomes on non-specific DNA. The range of nucleosome wrapping efficiency was narrower on α-satellite DNA compared with non-specific DNA, suggesting a more stable complex. These findings indicate that DNA sequence and histone composition may be two of many factors required for accurate centromere assembly
- …