Scalable relaxed clock phylogenetic dating

Abstract

Molecular clock models relate observed genetic diversity to calendar time, enabling estimation of times of common ancestry. Many large datasets of fast-evolving viruses are not well fitted by molecular clock models that assume a constant substitution rate through time, and more flexible relaxed clock models are required for robust inference of rates and dates. Estimation of relaxed molecular clocks using Bayesian Markov chain Monte Carlo is computationally expensive and may not scale well to large datasets. We build on recent advances in maximum likelihood and least-squares phylogenetic and molecular clock dating methods to develop a fast relaxed-clock method based on a Gamma-Poisson mixture model of substitution rates. This method estimates a distinct substitution rate for every lineage in the phylogeny while being scalable to large phylogenies. Unknown lineage sample dates can be estimated as well as unknown root position. We estimate confidence intervals for rates, dates, and tip dates using parametric and non-parametric bootstrap approaches. This method is implemented as an open-source R package, treedater

    Similar works