Verbal autopsy procedures are widely used for estimating cause-specific
mortality in areas without medical death certification. Data on symptoms
reported by caregivers along with the cause of death are collected from a
medical facility, and the cause-of-death distribution is estimated in the
population where only symptom data are available. Current approaches analyze
only one cause at a time, involve assumptions judged difficult or impossible to
satisfy, and require expensive, time-consuming, or unreliable physician
reviews, expert algorithms, or parametric statistical models. By generalizing
current approaches to analyze multiple causes, we show how most of the
difficult assumptions underlying existing methods can be dropped. These
generalizations also make physician review, expert algorithms and parametric
statistical assumptions unnecessary. With theoretical results, and empirical
analyses in data from China and Tanzania, we illustrate the accuracy of this
approach. While no method of analyzing verbal autopsy data, including the more
computationally intensive approach offered here, can give accurate estimates in
all circumstances, the procedure offered is conceptually simpler, less
expensive, more general, as or more replicable, and easier to use in practice
than existing approaches. We also show how our focus on estimating aggregate
proportions, which are the quantities of primary interest in verbal autopsy
studies, may also greatly reduce the assumptions necessary for, and thus
improve the performance of, many individual classifiers in this and other
areas. As a companion to this paper, we also offer easy-to-use software that
implements the methods discussed herein.Comment: Published in at http://dx.doi.org/10.1214/07-STS247 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org