1,446 research outputs found
Inferring modules of functionally interacting proteins using the Bond Energy Algorithm
<p>Abstract</p> <p>Background</p> <p>Non-homology based methods such as phylogenetic profiles are effective for predicting functional relationships between proteins with no considerable sequence or structure similarity. Those methods rely heavily on traditional similarity metrics defined on pairs of phylogenetic patterns. Proteins do not exclusively interact in pairs as the final biological function of a protein in the cellular context is often hold by a group of proteins. In order to accurately infer modules of functionally interacting proteins, the consideration of not only direct but also indirect relationships is required.</p> <p>In this paper, we used the Bond Energy Algorithm (<it>BEA</it>) to predict functionally related groups of proteins. With <it>BEA </it>we create clusters of phylogenetic profiles based on the associations of the surrounding elements of the analyzed data using a metric that considers linked relationships among elements in the data set.</p> <p>Results</p> <p>Using phylogenetic profiles obtained from the Cluster of Orthologous Groups of Proteins (<it>COG</it>) database, we conducted a series of clustering experiments using <it>BEA </it>to predict (upper level) relationships between profiles. We evaluated our results by comparing with <it>COG's </it>functional categories, And even more, with the experimentally determined functional relationships between proteins provided by the <it>DIP </it>and <it>ECOCYC </it>databases. Our results demonstrate that <it>BEA </it>is capable of predicting meaningful modules of functionally related proteins. <it>BEA </it>outperforms traditionally used clustering methods, such as <it>k</it>-means and hierarchical clustering by predicting functional relationships between proteins with higher accuracy.</p> <p>Conclusion</p> <p>This study shows that the linked relationships of phylogenetic profiles obtained by <it>BEA </it>is useful for detecting functional associations between profiles and extending functional modules not found by traditional methods. <it>BEA </it>is capable of detecting relationship among phylogenetic patterns by linking them through a common element shared in a group. Additionally, we discuss how the proposed method may become more powerful if other criteria to classify different levels of protein functional interactions, as gene neighborhood or protein fusion information, is provided.</p
STUDIES ON CORRELATED MUTATIONS ALGORITHMS OF PROTEINS PROVIDING STRUCTURAL, SPATIAL, AND ALLOSTERY INFORMATION FROM MULTIPLE SEQUENCE ALIGNMENTS
Proteins provide innumerable cellular functions and benefits for all kingdoms in the domains of life. Advancements in the high throughput collection and analysis of proteins have led to ever-deeper understanding of biological pathways, evolution, and coding biases. Most protein functional and/or structural analysis that is carried out in an in vitro manner is not amenable to high throughput technologies. With the incredible growth of sequences to study, we have capabilities to further refine algorithms that work in silico, using the work done in vitro as a benchmark. There has been a renaissance of the study of proteins using new approaches that are largely possible because of the amount of data now available for analysis. The research in this dissertation investigates some of the new techniques available in this field, to find the limitations of these techniques as well as improve upon them.
Chapter 1 presents both an overview of generalized techniques at the disposal of researchers looking for links between protein sequence covariance and allostery. The methods most commonly used including mutual information, chemical similarity matrixes, phylogenetic perturbation, and chi-square analysis are reviewed as well as the limits of such approaches to detecting allostery. Chapter 2 explores using a recent phylogenetic correction that has been successful for improving the efficacy of mutual information to predict special contact on the other algorithm types introduced in the first chapter. Chapter 3 is an attempt to detect bias of covariance algorithms on the rigid bodies found in protein structures. Chapter 4 is the description of a novel algorithm, termed COvariance By Sections (COBS), that in many ways is a combination of the methodologies used in Chapter 2 and Chapter 3, whereby we leverage a phylogenetic correction on groups of MSA columns rather than individual columns
Recommended from our members
Evolutionary Covariant Positions within Calmodulin EF-hand Sequences Promote Ligand Binding
Intracellular calcium signaling is an essential regulatory mechanism through calcium-mediated signal transduction pathways involved in many cell processes, such as exocytosis, motility, apoptosis, excitability, transcription, and muscle contraction. The calcium-binding, ubiquitous, and highly conserved protein calmodulin (CaM) is an important regulator of hundreds of target proteins involved in cellular calcium signaling. CaM comprises of two pairs of EF-hand calcium-binding domains and these structural regions of the protein are highly conserved. Studying the molecular mechanisms underlying the binding of calcium to the EF-hands of CaM is critical in understanding the calcium-mediated cellular processes and how improper binding of calcium can lead to various human pathologies. Previous site-specific binding measurements indicate that each of the four EF-hands of CaM have distinct affinities for calcium. In this study, we have utilized covariance patterns and site-specific mutagenesis to analyze calcium affinity in the two EF-hands of the N-lobe of CaM in order to determine the specific amino acids that are evolutionarily conserved to coordinate calcium. The specific amino acids in CaM that we studied are theorized to coevolve, which means that in their protein coding genes, when a mutation occurs, a compensatory mutation is likely to follow to conserve structure and function of CaM. Since CaM is a highly conserved protein with a known structure, covariance analyses will help in understanding which amino acid contacts are most important for the coordination of calcium in the EF-hands of CaM and to determine which amino acids are under evolutionary constraint. Covariance algorithms, multiple sequence analyses and accompanied protein structure analyses were used to identify the two high scoring amino acid pairs in the N-lobe EF-hands: positions 22 and 24 in EF-hand site 1 and positions 58 and 60 in EF-hand site 2. The amino acids in these locations were mutated and accompanied calcium binding was measured to better understand the effects of the mutations on calcium binding. We have found that both the D24N mutation in site 1 and the D58N mutation in site 2 disrupt binding likely due to the removal of a necessary aspartate in the binding site. However, the combined D58N and N60D mutations restore binding in site 2 by providing the necessary aspartate in the covariant location. The N60D mutation by itself has little impact on calcium binding in site 2. Therefore, it is evident that evolution conserves at least one aspartate in the covariant positions of the binding site and the presence of two aspartates in the covariant positions of the binding site has little affect on calcium binding. We are currently studying the covariant positions in site 1 and future work includes structurally analyzing the covariant positions in the C-lobe of CaM and studying covariance patterns of other calcium-binding proteins with EF-hand binding domains.Biochemistr
Protein sectors: statistical coupling analysis versus conservation
Statistical coupling analysis (SCA) is a method for analyzing multiple
sequence alignments that was used to identify groups of coevolving residues
termed "sectors". The method applies spectral analysis to a matrix obtained by
combining correlation information with sequence conservation. It has been
asserted that the protein sectors identified by SCA are functionally
significant, with different sectors controlling different biochemical
properties of the protein. Here we reconsider the available experimental data
and note that it involves almost exclusively proteins with a single sector. We
show that in this case sequence conservation is the dominating factor in SCA,
and can alone be used to make statistically equivalent functional predictions.
Therefore, we suggest shifting the experimental focus to proteins for which SCA
identifies several sectors. Correlations in protein alignments, which have been
shown to be informative in a number of independent studies, would then be less
dominated by sequence conservation.Comment: 36 pages, 17 figure
Information Theory in Molecular Evolution: From Models to Structures and Dynamics
This Special Issue collects novel contributions from scientists in the interdisciplinary field of biomolecular evolution. Works listed here use information theoretical concepts as a core but are tightly integrated with the study of molecular processes. Applications include the analysis of phylogenetic signals to elucidate biomolecular structure and function, the study and quantification of structural dynamics and allostery, as well as models of molecular interaction specificity inspired by evolutionary cues
Computational Molecular Coevolution
A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue contact. Thus, high-quality multiple sequence alignments and robust coevolution-inferring statistics can produce structural information from sequence alone. This work characterizes the relationship between coevolution statistics and sequence alignments and highlights the implicit assumptions and caveats associated with coevolutionary inference. An investigation of sequence alignment quality and coevolutionary-inference methods revealed that such methods are very sensitive to the systematic misalignments discovered in public databases. However, repairing the misalignments in such alignments restores the predictive power of coevolution statistics. To overcome the sensitivity to misalignments, two novel coevolution-inferring statistics were developed that show increased contact prediction accuracy, especially in alignments that contain misalignments. These new statistics were developed into a suite of coevolution tools, the MIpToolset. Because systematic misalignments produce a distinctive pattern when analyzed by coevolution-inferring statistics, a new method for detecting systematic misalignments was created to exploit this phenomenon. This new method called ``local covariation\u27\u27 was used to analyze publicly-available multiple sequence alignment databases. Local covariation detected putative misalignments in a database designed to benchmark sequence alignment software accuracy. Local covariation was incorporated into a new software tool, LoCo, which displays regions of potential misalignment during alignment editing assists in their correction. This work represents advances in multiple sequence alignment creation and coevolutionary inference
Protein 3D Structure Computed from Evolutionary Sequence Variation
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing
- …