134 research outputs found

    Information Theory and Multivariate Techniques for Analyzing DNA Sequence Data: An Example from Tomato Genes

    Get PDF
    oai:nepjol:article/3867DNA and amino acid sequences are alphabetic symbols having no underlying metric. Use of information theory is one of the solutions for sequence metric problems. The reflection of DNA sequence complexity in phenotype stability might be useful for crop improvement. Shannon-Weaver index (Shannon Entropy, H') and mutual information (MI) index were estimated from DNA sequences of 22 genes, consisted of two gene families of tomato, namely disease resistance and fruit quality. Main objective was use of information theory and multivariate techniques to understand diversity among genes and relate the sequence complexity with phenotypes. The normalized H' value ranged from 0.429 to 0.461. The highest diversity was observed in the gene Crtr-B (beta carotene hydroxylase). Two principal components which accounted for 36.65% variation placed these genes into four groups. Groupings of these genes by both principal component and cluster analyses showed clearly the similarity at phenotypes levels within cluster. Sequences similarity among genes was observed within a family. Diversity assessment of genes applying information theory should link to understand the sequences complexity with respect to gene stability for example stability of resistance gene.Key words: Diversity analysis; DNA sequences; principal component analysis; tomato genesNepal Journal of Biotechnology, 2011, Vol. 1, No. 1 pp.1-

    Identification of direct residue contacts in protein-protein interaction by message passing

    Full text link
    Understanding the molecular determinants of specificity in protein-protein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein-protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.Comment: Supplementary information available on http://www.pnas.org/content/106/1/67.abstrac

    Finite sample sizes and phylogeny do not ACCOUNT FOR THE MUTUAL INFORMATION OBSERVED FOR MOST SITE-PAIRS IN MULTIPLE SEQUENCE ALIGNMENT

    Get PDF
    Mutual information (MI) is a measure frequently used to find co-evolving sites in protein families. However, factors unrelated to protein structure and function, in particular sampling variance in amino acid counts and complex evolutionary relationships among sequences, contribute to ML Understanding the contribution of these components is essential for isolating the MI associated with structural or functional co-evolution. To date, the contributions of these factors to mutual information have not been fully elucidated. We find that stochastic variations in amino acid counts and shared phylogeny each contribute substantially to measured MI. Nonetheless, the mutual information observed in real-world protein families is consistently higher than the expected contribution of these two factors. In contrast, when using synthetic data with realistic substitution rates and phylogenies, but without structural or functional constraints, the observed levels of MI match those expected due to stochastic and phylogenetic background. Our results suggest that either low levels of co-evolution are ubiquitous across positions in protein families, or some unknown factor exists beyond the currently hypothesized components of intra-protein mutual information: sampling variance, phylogenetics and structural/functional co-evolution

    The use of information theory in evolutionary biology

    Full text link
    Information is a key concept in evolutionary biology. Information is stored in biological organism's genomes, and used to generate the organism as well as to maintain and control it. Information is also "that which evolves". When a population adapts to a local environment, information about this environment is fixed in a representative genome. However, when an environment changes, information can be lost. At the same time, information is processed by animal brains to survive in complex environments, and the capacity for information processing also evolves. Here I review applications of information theory to the evolution of proteins as well as to the evolution of information processing in simulated agents that adapt to perform a complex task.Comment: 25 pages, 7 figures. To appear in "The Year in Evolutionary Biology", of the Annals of the NY Academy of Science

    Residue coevolution: modeling and interpretation

    Get PDF
    Coevolution between amino acid residues and its context-dependence are important for exploring protein structure and function, and critical for understanding protein structural and functional evolution. Coevolution has long been ignored because of its complexity and the lack of computing power. In the research presented here, I developed an efficient coevolution analysis methodology based on likelihood comparisons of statistical models. Likelihood ratios and Bayes factors, calculated using the Markov chain Monte Carlo algorithm, were employed as the statistics. Two types of models, 2-state and 3-state, were developed to allow for the context-dependence of coevolution. Computer programs implementing this methodology were coded in C/C++ and were run on the Beowulf clusters of our laboratory and the super computers of LSU. Using these programs and custom Perl scripts, residue coevolution in cytochrome c oxidase and photolyases/cryptochromes protein superfamily was analyzed. I found that pairwise coevolution between residues is highly dependent on protein tertiary structures and functions. I detected extensive coevolving pairs in all our analyses, and these pairs were primary localized in regions of known structural and/or functional importance. I also found that coevolution is related to evolutionary rate and concentrated in moderately conserved sites. In supporting the importance of functional constraints, I detected a non-negligible coevolutionary signal between complex subunits and stronger coevolution in proteins of functional importance. I also found that the interaction between subunits can serve as a local coevolutionary constraint on one subunit rather than driving coevolution between two subunits. Based on coevolutionary patterns, I suggested that a domain without any previously supposed function actually operates as a folding core in the proteins of photolyase/cryptochrome superfamily. The coevolutionary patterns also provided clues regarding the functional evolution of electron transfer in this superfamily. I also found that coevolving sites with double substitutions along a branch tend to occur only at physically contacting sites, and that salt-bridge stabilization and secondary structure stabilization are important forces of residue coevolution. The methodology and programs developed in this research are powerful tools for coevolutionary analysis, which can provide valuable information for characterization of protein structural/functional domains and exploration of protein structural/functional evolution

    Why Should We Care About Molecular Coevolution?

    Get PDF
    Non-independent evolution of amino acid sites has become a noticeable limitation of most methods aimed at identifying selective constraints at functionally important amino acid sites or protein regions. The need for a generalised framework to account for non-independence of amino acid sites has fuelled the design and development of new mathematical models and computational tools centred on resolving this problem. Molecular coevolution is one of the most active areas of research, with an increasing rate of new models and methods being developed everyday. Both parametric and non-parametric methods have been developed to account for correlated variability of amino acid sites. These methods have been utilised for detecting phylogenetic, functional and structural coevolution as well as to identify surfaces of amino acid sites involved in protein-protein interactions. Here we discuss and briefly describe these methods, and identify their advantages and limitations

    Computing Highly Correlated Positions Using Mutual Information and Graph Theory for G Protein-Coupled Receptors

    Get PDF
    G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families

    The Coevolution of Phycobilisomes: Molecular Structure Adapting to Functional Evolution

    Get PDF
    Phycobilisome is the major light-harvesting complex in cyanobacteria and red alga. It consists of phycobiliproteins and their associated linker peptides which play key role in absorption and unidirectional transfer of light energy and the stability of the whole complex system, respectively. Former researches on the evolution among PBPs and linker peptides had mainly focused on the phylogenetic analysis and selective evolution. Coevolution is the change that the conformation of one residue is interrupted by mutation and a compensatory change selected for in its interacting partner. Here, coevolutionary analysis of allophycocyanin, phycocyanin, and phycoerythrin and covariation analysis of linker peptides were performed. Coevolution analyses reveal that these sites are significantly correlated, showing strong evidence of the functional and structural importance of interactions among these residues. According to interprotein coevolution analysis, less interaction was found between PBPs and linker peptides. Our results also revealed the correlations between the coevolution and adaptive selection in PBS were not directly related, but probably demonstrated by the sites coupled under physical-chemical interactions

    Structural and Functional Roles of Coevolved Sites in Proteins

    Get PDF
    Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification.In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution.Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites
    corecore