Why do phylogenetic algorithms fail when they return incorrect answers? This
simple question has not been answered in detail, even for maximum parsimony
(MP), the simplest phylogenetic criterion. Understanding MP has recently gained
relevance in the regime of extremely dense sampling, where each virus sample
commonly differs by zero or one mutation from another previously sampled virus.
Although recent research shows that evolutionary histories in this regime are
close to being maximally parsimonious, the structure of their deviations from
MP is not yet understood. In this paper, we develop algorithms to understand
how the correct tree deviates from being MP in the densely sampled case. By
applying these algorithms to simulations that realistically mimic the evolution
of SARS-CoV-2, we find that simulated trees frequently only deviate from
maximally parsimonious trees locally, through simple structures consisting of
the same mutation appearing independently on sister branches.Comment: 18 pages, 7 figures, submitted to RECOMB 202