Abstract Background Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. Results Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. Conclusion The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.</p

Caporaso, J Gregory

Easton, Brett C

Hunter, Lawrence

Huttley, Gavin A

Knight, Rob

Smit, Sandra

English

PubMed

J Gregory Caporaso

Sandra Smit

Brett C Easton

Lawrence Hunter

Gavin A Huttley

Rob Knight

Crossref

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics

BackgroundIdentifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance.ResultsConsistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical.ConclusionThe results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry

eScholarship - University of California

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics.

Abstract Background Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. Results Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. Conclusion The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.</p

Huttley Gavin A

Hunter Lawrence

Easton Brett C

Smit Sandra

Caporaso J Gregory

Knight Rob

Directory of Open Access Journals

BMC Evolutionary Biology

Springer - Publisher Connector

BACKGROUND: Identifying coevolving positions in protein sequences has myriad applications,
ranging from understanding and predicting the structure of single molecules to generating
proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be
classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant,
which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are
widely held to be insufficiently accurate because of a confounding of shared ancestry with
coevolution. We conjectured that by using a null distribution that appropriately controls for the
shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically
compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to
myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state
amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both
statistical and computational performance.
RESULTS: Consistent with our conjecture, the transformed tree-ignorant metrics (particularly
Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of
recoding suggested that charge-based alphabets were generally superior for identifying the
stabilizing interactions in alpha helices. Performance was not always improved by recoding
however, indicating that the choice of alphabet is critical.
CONCLUSION: The results suggest that t-test transformation of tree-ignorant metrics can be
sufficient to control for patterns arising from shared ancestry

Easton, Brett

Huttley, Gavin Austin

The Australian National University

A: Correlated mutations contain information about protein-protein interaction.

A: Measuring covariation in RNA alignments: physical realism improves information measures. Bioinformatics

Aldrich RW: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins

Aldrich RW: On evolutionary conservation of thermodynamic coupling in proteins.

Atchley WR: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. P r o c N a t l A c a d S c i U S A

Bioinformatics assessment of beta-myosin mutations reveals myosin's high sensitivity to mutations. Trends Cardiovasc Med

Brutlag DL: Discovering structural correlations in alpha-helices.

BW: Alanine scanning mutagenesis of the alpha-helix 115–123 of phage T4 lysozyme: effects on structure, stability and the binding of solvent.

Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Engineering

Coevolving protein residues: maximum likelihood identification and relationship to structure.

Darlu P: Exploring a phylogenetic approach for the detection of correlated substitutions in proteins. Mol Biol Evol

Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics

Detecting the coevolution of biosequences-an example of RNA interaction prediction. Mol Biol Evol

Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol

Fersht AR: Aromatic-aromatic interactions and protein stability. Investigation by double-mutant cycles.

Frishman D: Co-evolving residues in membrane proteins. Bioinformatics

Galtier N: A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol

Galtier N: Detecting groups of coevolving positions in a molecule: a clustering approach.

GB: Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics

Haussler D: Detecting coevolution in and among protein domains. PLoS Comput Biol

Huttley G: A probabilistic method to identify compensatory substitutions for pathogenic mutations.

Huttley GA: PyCogent: a toolkit for making sense from sequence. Genome Biol

Intrahelical side chain interactions in alpha-helices: poor correlation between energetics and frequency. FEBS Lett

Kondrashov FA: Mechanisms and convergence of compensatory evolution in mammalian mitochondrial tRNAs. Nat Genet

Liberles DA: The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics. Nucleic Acids Res

LM: Using information theory to search for co-evolving residues in proteins. Bioinformatics

Lui TW: Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics

Maranas CD: Using multiple sequence correlation analysis to characterize functionally important protein regions. Protein Eng

Novel techniques for detecting correlated evolution.

Pesole G: Correlated substitution analysis and the prediction of amino acid structural contacts. Brief Bioinform

Pollock DD: Coevolutionary patterns in cytochrome coxidase subunit I depend on structural and functional context. J Mol Evol

Pollock DD: Context dependence and coevolution among amino acid residues in proteins. Methods Enzymol

Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comput Biol

Ranganathan R: Evolutionarily Conserved Pathways of Energetic Connectivity in Protein Families. Science

RL: Helical peptides with three pairs of Asp-Arg and Glu-Arg residues in different orientations and spacings. Protein Sci

RL: Helix stabilization by Glu-...Lys+ salt bridges in short peptides of de novo design.

Rohlf FJ: Biometry Volume chap 18.

Solving the protein sequence metric problem. P r o c N a t l A c a d S c i U S A

Stabilization of the long central helix of troponin C by intrahelical salt bridges between charged amino acid side chains. Proc Natl Acad Sci USA

Statistical analysis of intrahelical ionic interactions in alpha-helices and coiled coils.

Sun Z: Inferring functional linkages between proteins from evolutionary scenarios.

Taylor WR: Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng

The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol

TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res

UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol

file:///data/core-remote/dit/data/Springer-OA/pdf/8b3/aHR0cDovL2xpbmsuc3ByaW5nZXIuY29tLzEwLjExODYvMTQ3MS0yMTQ4LTgtMzI3LnBkZg==.pdf

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics

Abstract

Similar works

Full text

Available Versions

Crossref

eScholarship - University of California

Directory of Open Access Journals

Springer - Publisher Connector

The Australian National University

Springer - Publisher Connector