Recent work has attempted to use whole-genome sequence data from pathogens to
reconstruct the transmission trees linking infectors and infectees in
outbreaks. However, transmission trees from one outbreak do not generalize to
future outbreaks. Reconstruction of transmission trees is most useful to public
health if it leads to generalizable scientific insights about disease
transmission. In a survival analysis framework, estimation of transmission
parameters is based on sums or averages over the possible transmission trees. A
phylogeny can increase the precision of these estimates by providing partial
information about who infected whom. The leaves of the phylogeny represent
sampled pathogens, which have known hosts. The interior nodes represent common
ancestors of sampled pathogens, which have unknown hosts. Starting from
assumptions about disease biology and epidemiologic study design, we prove that
there is a one-to-one correspondence between the possible assignments of
interior node hosts and the transmission trees simultaneously consistent with
the phylogeny and the epidemiologic data on person, place, and time. We develop
algorithms to enumerate these transmission trees and show these can be used to
calculate likelihoods that incorporate both epidemiologic data and a phylogeny.
A simulation study confirms that this leads to more efficient estimates of
hazard ratios for infectiousness and baseline hazards of infectious contact,
and we use these methods to analyze data from a foot-and-mouth disease virus
outbreak in the United Kingdom in 2001. These results demonstrate the
importance of data on individuals who escape infection, which is often
overlooked. The combination of survival analysis and algorithms linking
phylogenies to transmission trees is a rigorous but flexible statistical
foundation for molecular infectious disease epidemiology.Comment: 28 pages, 11 figures, 3 table