1 research outputs found
Random-effects substitution models for phylogenetics via scalable gradient approximations
Phylogenetic and discrete-trait evolutionary inference depend heavily on an
appropriate characterization of the underlying character substitution process.
In this paper, we present random-effects substitution models that extend common
continuous-time Markov chain models into a richer class of processes capable of
capturing a wider variety of substitution dynamics. As these random-effects
substitution models often require many more parameters than their usual
counterparts, inference can be both statistically and computationally
challenging. Thus, we also propose an efficient approach to compute an
approximation to the gradient of the data likelihood with respect to all
unknown substitution model parameters. We demonstrate that this approximate
gradient enables scaling of sampling-based inference, namely Bayesian inference
via Hamiltonian Monte Carlo, under random-effects substitution models across
large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences,
an HKY model with random-effects shows strong signals of nonreversibility in
the substitution process, and posterior predictive model checks clearly show
that it is a more adequate model than a reversible model. When analyzing the
pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences
between 14 regions, a random-effects phylogeographic substitution model infers
that air travel volume adequately predicts almost all dispersal rates. A
random-effects state-dependent substitution model reveals no evidence for an
effect of arboreality on the swimming mode in the tree frog subfamily Hylinae.
Simulations reveal that random-effects substitution models can accommodate both
negligible and radical departures from the underlying base substitution model.
We show that our gradient-based inference approach is over an order of
magnitude more time efficient than conventional approaches