14,627 research outputs found
Inter-protein sequence co-evolution predicts known physical interactions in bacterial ribosomes and the trp operon
Interaction between proteins is a fundamental mechanism that underlies
virtually all biological processes. Many important interactions are conserved
across a large variety of species. The need to maintain interaction leads to a
high degree of co-evolution between residues in the interface between partner
proteins. The inference of protein-protein interaction networks from the
rapidly growing sequence databases is one of the most formidable tasks in
systems biology today. We propose here a novel approach based on the
Direct-Coupling Analysis of the co-evolution between inter-protein residue
pairs. We use ribosomal and trp operon proteins as test cases: For the small
resp. large ribosomal subunit our approach predicts protein-interaction
partners at a true-positive rate of 70% resp. 90% within the first 10
predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all
predictions. In the trp operon, it assigns the two largest interaction scores
to the only two interactions experimentally known. On the level of residue
interactions we show that for both the small and the large ribosomal subunit
our approach predicts interacting residues in the system with a true positive
rate of 60% and 85% in the first 20 predictions. We use artificial data to show
that the performance of our approach depends crucially on the size of the joint
multiple sequence alignments and analyze how many sequences would be necessary
for a perfect prediction if the sequences were sampled from the same model that
we use for prediction. Given the performance of our approach on the test data
we speculate that it can be used to detect new interactions, especially in the
light of the rapid growth of available sequence data
Viral RNAs are unusually compact.
A majority of viruses are composed of long single-stranded genomic RNA molecules encapsulated by protein shells with diameters of just a few tens of nanometers. We examine the extent to which these viral RNAs have evolved to be physically compact molecules to facilitate encapsulation. Measurements of equal-length viral, non-viral, coding and non-coding RNAs show viral RNAs to have among the smallest sizes in solution, i.e., the highest gel-electrophoretic mobilities and the smallest hydrodynamic radii. Using graph-theoretical analyses we demonstrate that their sizes correlate with the compactness of branching patterns in predicted secondary structure ensembles. The density of branching is determined by the number and relative positions of 3-helix junctions, and is highly sensitive to the presence of rare higher-order junctions with 4 or more helices. Compact branching arises from a preponderance of base pairing between nucleotides close to each other in the primary sequence. The density of branching represents a degree of freedom optimized by viral RNA genomes in response to the evolutionary pressure to be packaged reliably. Several families of viruses are analyzed to delineate the effects of capsid geometry, size and charge stabilization on the selective pressure for RNA compactness. Compact branching has important implications for RNA folding and viral assembly
Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes
Empirical substitution matrices represent the average tendencies of
substitutions over various protein families by sacrificing gene-level
resolution. We develop a codon-based model, in which mutational tendencies of
codon, a genetic code, and the strength of selective constraints against amino
acid replacements can be tailored to a given gene. First, selective constraints
averaged over proteins are estimated by maximizing the likelihood of each 1-PAM
matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution
matrices. Then, selective constraints specific to given proteins are
approximated as a linear function of those estimated from the empirical
substitution matrices.
Akaike information criterion (AIC) values indicate that a model allowing
multiple nucleotide changes fits the empirical substitution matrices
significantly better. Also, the ML estimates of transition-transversion bias
obtained from these empirical matrices are not so large as previously
estimated. The selective constraints are characteristic of proteins rather than
species. However, their relative strengths among amino acid pairs can be
approximated not to depend very much on protein families but amino acid pairs,
because the present model, in which selective constraints are approximated to
be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can
provide a good fit to other empirical substitution matrices including cpREV for
chloroplast proteins and mtREV for vertebrate mitochondrial proteins.
The present codon-based model with the ML estimates of selective constraints
and with adjustable mutation rates of nucleotide would be useful as a simple
substitution model in ML and Bayesian inferences of molecular phylogenetic
trees, and enables us to obtain biologically meaningful information at both
nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table
9 published in 10.1371/journal.pone.0017244. Supporting information is
attached at the end of the article, and a computer-readable dataset of the ML
estimates of selective constraints is available from
10.1371/journal.pone.001724
Protein sectors: statistical coupling analysis versus conservation
Statistical coupling analysis (SCA) is a method for analyzing multiple
sequence alignments that was used to identify groups of coevolving residues
termed "sectors". The method applies spectral analysis to a matrix obtained by
combining correlation information with sequence conservation. It has been
asserted that the protein sectors identified by SCA are functionally
significant, with different sectors controlling different biochemical
properties of the protein. Here we reconsider the available experimental data
and note that it involves almost exclusively proteins with a single sector. We
show that in this case sequence conservation is the dominating factor in SCA,
and can alone be used to make statistically equivalent functional predictions.
Therefore, we suggest shifting the experimental focus to proteins for which SCA
identifies several sectors. Correlations in protein alignments, which have been
shown to be informative in a number of independent studies, would then be less
dominated by sequence conservation.Comment: 36 pages, 17 figure
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
A two-phase approach for detecting recombination in nucleotide sequences
Genetic recombination can produce heterogeneous phylogenetic histories within
a set of homologous genes. Delineating recombination events is important in the
study of molecular evolution, as inference of such events provides a clearer
picture of the phylogenetic relationships among different gene sequences or
genomes. Nevertheless, detecting recombination events can be a daunting task,
as the performance of different recombinationdetecting approaches can vary,
depending on evolutionary events that take place after recombination. We
recently evaluated the effects of postrecombination events on the prediction
accuracy of recombination-detecting approaches using simulated nucleotide
sequence data. The main conclusion, supported by other studies, is that one
should not depend on a single method when searching for recombination events.
In this paper, we introduce a two-phase strategy, applying three statistical
measures to detect the occurrence of recombination events, and a Bayesian
phylogenetic approach in delineating breakpoints of such events in nucleotide
sequences. We evaluate the performance of these approaches using simulated
data, and demonstrate the applicability of this strategy to empirical data. The
two-phase strategy proves to be time-efficient when applied to large datasets,
and yields high-confidence results.Comment: 5 pages, 3 figures. Chan CX, Beiko RG and Ragan MA (2007). A
two-phase approach for detecting recombination in nucleotide sequences. In
Hazelhurst S and Ramsay M (Eds) Proceedings of the First Southern African
Bioinformatics Workshop, 28-30 January, Johannesburg, 9-1
- …