Tuberculosis disease is a major global public health concern and the growing prevalence
of drug-resistant Mycobacterium tuberculosis is making disease control more difficult.
However, the increasing application of whole-genome sequencing as a diagnostic tool is
leading to the profiling of drug resistance to inform clinical practice and treatment
decision making. Computational approaches for identifying established and novel
resistance-conferring mutations in genomic data include genome-wide association study
(GWAS) methodologies, tests for convergent evolution and machine learning techniques.
These methods may be confounded by extensive co-occurrent resistance, where
statistical models for a drug include unrelated mutations known to be causing resistance
to other drugs. Here, we introduce a novel ‘cannibalistic’ elimination algorithm
(“Hungry, Hungry SNPos”) that attempts to remove these co-occurrent resistant
variants. Using an M. tuberculosis genomic dataset for the virulent Beijing strain-type
(n=3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin,
ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach
is considerably more robust than traditional methods and detects resistance-associated
variants too rare to be likely picked up by correlation-based techniques like GWA