1,446 research outputs found

    Inferring modules of functionally interacting proteins using the Bond Energy Algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-homology based methods such as phylogenetic profiles are effective for predicting functional relationships between proteins with no considerable sequence or structure similarity. Those methods rely heavily on traditional similarity metrics defined on pairs of phylogenetic patterns. Proteins do not exclusively interact in pairs as the final biological function of a protein in the cellular context is often hold by a group of proteins. In order to accurately infer modules of functionally interacting proteins, the consideration of not only direct but also indirect relationships is required.</p> <p>In this paper, we used the Bond Energy Algorithm (<it>BEA</it>) to predict functionally related groups of proteins. With <it>BEA </it>we create clusters of phylogenetic profiles based on the associations of the surrounding elements of the analyzed data using a metric that considers linked relationships among elements in the data set.</p> <p>Results</p> <p>Using phylogenetic profiles obtained from the Cluster of Orthologous Groups of Proteins (<it>COG</it>) database, we conducted a series of clustering experiments using <it>BEA </it>to predict (upper level) relationships between profiles. We evaluated our results by comparing with <it>COG's </it>functional categories, And even more, with the experimentally determined functional relationships between proteins provided by the <it>DIP </it>and <it>ECOCYC </it>databases. Our results demonstrate that <it>BEA </it>is capable of predicting meaningful modules of functionally related proteins. <it>BEA </it>outperforms traditionally used clustering methods, such as <it>k</it>-means and hierarchical clustering by predicting functional relationships between proteins with higher accuracy.</p> <p>Conclusion</p> <p>This study shows that the linked relationships of phylogenetic profiles obtained by <it>BEA </it>is useful for detecting functional associations between profiles and extending functional modules not found by traditional methods. <it>BEA </it>is capable of detecting relationship among phylogenetic patterns by linking them through a common element shared in a group. Additionally, we discuss how the proposed method may become more powerful if other criteria to classify different levels of protein functional interactions, as gene neighborhood or protein fusion information, is provided.</p

    STUDIES ON CORRELATED MUTATIONS ALGORITHMS OF PROTEINS PROVIDING STRUCTURAL, SPATIAL, AND ALLOSTERY INFORMATION FROM MULTIPLE SEQUENCE ALIGNMENTS

    Get PDF
    Proteins provide innumerable cellular functions and benefits for all kingdoms in the domains of life. Advancements in the high throughput collection and analysis of proteins have led to ever-deeper understanding of biological pathways, evolution, and coding biases. Most protein functional and/or structural analysis that is carried out in an in vitro manner is not amenable to high throughput technologies. With the incredible growth of sequences to study, we have capabilities to further refine algorithms that work in silico, using the work done in vitro as a benchmark. There has been a renaissance of the study of proteins using new approaches that are largely possible because of the amount of data now available for analysis. The research in this dissertation investigates some of the new techniques available in this field, to find the limitations of these techniques as well as improve upon them. Chapter 1 presents both an overview of generalized techniques at the disposal of researchers looking for links between protein sequence covariance and allostery. The methods most commonly used including mutual information, chemical similarity matrixes, phylogenetic perturbation, and chi-square analysis are reviewed as well as the limits of such approaches to detecting allostery. Chapter 2 explores using a recent phylogenetic correction that has been successful for improving the efficacy of mutual information to predict special contact on the other algorithm types introduced in the first chapter. Chapter 3 is an attempt to detect bias of covariance algorithms on the rigid bodies found in protein structures. Chapter 4 is the description of a novel algorithm, termed COvariance By Sections (COBS), that in many ways is a combination of the methodologies used in Chapter 2 and Chapter 3, whereby we leverage a phylogenetic correction on groups of MSA columns rather than individual columns

    Protein sectors: statistical coupling analysis versus conservation

    Full text link
    Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.Comment: 36 pages, 17 figure

    Information Theory in Molecular Evolution: From Models to Structures and Dynamics

    Get PDF
    This Special Issue collects novel contributions from scientists in the interdisciplinary field of biomolecular evolution. Works listed here use information theoretical concepts as a core but are tightly integrated with the study of molecular processes. Applications include the analysis of phylogenetic signals to elucidate biomolecular structure and function, the study and quantification of structural dynamics and allostery, as well as models of molecular interaction specificity inspired by evolutionary cues

    Computational Molecular Coevolution

    Get PDF
    A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue contact. Thus, high-quality multiple sequence alignments and robust coevolution-inferring statistics can produce structural information from sequence alone. This work characterizes the relationship between coevolution statistics and sequence alignments and highlights the implicit assumptions and caveats associated with coevolutionary inference. An investigation of sequence alignment quality and coevolutionary-inference methods revealed that such methods are very sensitive to the systematic misalignments discovered in public databases. However, repairing the misalignments in such alignments restores the predictive power of coevolution statistics. To overcome the sensitivity to misalignments, two novel coevolution-inferring statistics were developed that show increased contact prediction accuracy, especially in alignments that contain misalignments. These new statistics were developed into a suite of coevolution tools, the MIpToolset. Because systematic misalignments produce a distinctive pattern when analyzed by coevolution-inferring statistics, a new method for detecting systematic misalignments was created to exploit this phenomenon. This new method called ``local covariation\u27\u27 was used to analyze publicly-available multiple sequence alignment databases. Local covariation detected putative misalignments in a database designed to benchmark sequence alignment software accuracy. Local covariation was incorporated into a new software tool, LoCo, which displays regions of potential misalignment during alignment editing assists in their correction. This work represents advances in multiple sequence alignment creation and coevolutionary inference

    Protein 3D Structure Computed from Evolutionary Sequence Variation

    Get PDF
    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing
    • …
    corecore