Search CORE

18 research outputs found

Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

Author: Ben Murrell
C Kosiol
D Posada
D Posada
D Robinson
Daniel Kaliski
DC Nickle
DD Lee
DJ Lipman
DT Jones
F Abascal
Gerdus Benade
J Adachi
J Felsenstein
J Felsenstein
Jan Buys
K Devarajan
Konrad Scheffler
KP Burnham
KP Burnham
L Stanfel
Lise du Buisson
MO Dayhoff
MO Dayhoff
MW Dimmic
N Goldman
N Lartillot
Robert Ketteringham
S Whelan
S Whelan
S Zoller
SA Guindon
Sasha Moola
SL Kosakovsky Pond
SL Kosakovsky Pond
SQ Le
SQ Le
Thomas Mailund
Thomas Weighill
Tristan Hands
W Delport
Y Cao
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models

Public Library of Science (PLOS)

Cape Town University OpenUCT

Crossref

Directory of Open Access Journals

PubMed Central

Stellenbosch University SUNScholar Repository

Detecting individual sites subject to episodic diversifying selection

Author: Moola Sasha
Murrell Ben
Pond Sergei L. Kosakovsky
Scheffler Konrad
Weighill Thomas
Wertheim Joel O.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

CITATION: Murrell, B. et a. 2012. Detecting individual sites subject to episodic diversifying selection. PLoS Genetics, 8(7): e1002764, doi:10.1371/journal.pgen.1002764.The original publication is available at http://journals.plos.org/plosgeneticsThe imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002764Publisher's versio

Directory of Open Access Journals

PubMed Central

Stellenbosch University SUNScholar Repository

Recommended from our members

FUBAR: A Fast, Unconstrained Bayesian AppRoximation for Inferring Selection

Author: Kosakovsky Pond Sergei L.
Mabona Amandla
Moola Sasha
Murrell Ben
Scheffler Konrad
Sheward Daniel
Weighill Thomas
Publication venue: eScholarship, University of California
Publication date: 18/02/2013
Field of study

Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection—an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/)

eScholarship - University of California

FUBAR: A Fast, Unconstrained Bayesian AppRoximation for Inferring Selection

Author: Kosakovsky Pond Sergei L.
Mabona Amandla
Moola Sasha
Murrell Ben
Scheffler Konrad
Sheward Daniel
Weighill Thomas
Publication venue: eScholarship, University of California
Publication date: 18/02/2013
Field of study

Crossref

PubMed Central

eScholarship - University of California

Comparative performance of MEME and FEL on 16 empirical alignments (see Results and Text S1 for an extended discussion of each individual case).

Author: Ben Murrell (151760)
Joel O. Wertheim (151765)
Konrad Scheffler (49776)
Sasha Moola (151768)
Sergei L. Kosakovsky Pond (25959)
Thomas Weighill (151773)
Publication venue
Publication date
Field of study

() reports the number of sequences (codons) in the alignment. () refers sites found by MEME to be positively (negatively) selected (). () denote sites found by FEL to be positively (negatively) selected (). references sites that are classified as neutrally evolving by FEL. Values in parentheses for the column show the mean p-values for FEL and MEME on this set of sites, respectively. Values reported in the rightmost column count the number of sites where MEME fits significantly better than FEL, based on a 2-degrees of freedom LRT (). Abbreviations: IAV = Influenza A virus, JEV = Japanese encephalitis virus.</p

FigShare

Comparative performance of FEL and MEME on simulated data where varies along phylogenetic lineages.

Author: Ben Murrell (151760)
Joel O. Wertheim (151765)
Konrad Scheffler (49776)
Sasha Moola (151768)
Sergei L. Kosakovsky Pond (25959)
Thomas Weighill (151773)
Publication venue
Publication date
Field of study

Power to detect sites under selection () are reported for FEL and MEME (in boldface) for each unique combination of negative selection (), positive selection (), and proportion of branches under positive selection () parameters.</p

FigShare

Individual sites of the vertebrate rhodopsin alignment used to illustrate similarities and differences between FEL and MEME.

Author: Ben Murrell (151760)
Joel O. Wertheim (151765)
Konrad Scheffler (49776)
Sasha Moola (151768)
Sergei L. Kosakovsky Pond (25959)
Thomas Weighill (151773)
Publication venue
Publication date
Field of study

Branches that have experienced substitutions, based on most likely joint maximum likelihood ancestral reconstructions at a given site, are labeled as count of synonymous substitutions:count of non-synonymous substitutions. The thickness of each branch is proportional to the minimal number of single nucleotide substitutions mapped to the branch. Branches are colored according to the magnitude of the empirical Bayes factor (EBF) for the event of positive selection: red – evidence for positive selection, teal – evidence for neutral evolution or negative selection, black –Ê no information. See <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1002764#s2" target="_blank">Methods</a> for more detail. All three sites were identified as experiencing positive diversifying selection by MEME. FEL reported site 54 as positively selected, site 273 as neutral, and site 210 as negatively selected.</p

FigShare

Non-negative matrix factorization.

Author: Ben Murrell (151760)
Daniel Kaliski (336793)
Gerdus Benade (336791)
Jan Buys (336786)
Konrad Scheffler (49776)
Lise du Buisson (336792)
Robert Ketteringham (336789)
Sasha Moola (151768)
Thomas Weighill (151773)
Tristan Hands (336794)
Publication venue
Publication date: 20/02/2013
Field of study

Non-negative matrix factorization.</p

FigShare

NNMF basis matrices.

Author: Ben Murrell (151760)
Daniel Kaliski (336793)
Gerdus Benade (336791)
Jan Buys (336786)
Konrad Scheffler (49776)
Lise du Buisson (336792)
Robert Ketteringham (336789)
Sasha Moola (151768)
Thomas Weighill (151773)
Tristan Hands (336794)
Publication venue
Publication date
Field of study

The set of NNMF basis matrices obtained for ranks ranging from 1 to 5. Amino acids are ordered according to their Stanfel classification <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0028898#pone.0028898-Stanfel1" target="_blank">[25]</a>. Rates are indicated in grayscale, with pure white being a rate of zero and pure black being the maximum rate in the matrix.</p

FigShare