Modern scientific studies often require the identification of a subset of
relevant explanatory variables, in the attempt to understand an interesting
phenomenon. Several statistical methods have been developed to automate this
task, but only recently has the framework of model-free knockoffs proposed a
general solution that can perform variable selection under rigorous type-I
error control, without relying on strong modeling assumptions. In this paper,
we extend the methodology of model-free knockoffs to a rich family of problems
where the distribution of the covariates can be described by a hidden Markov
model (HMM). We develop an exact and efficient algorithm to sample knockoff
copies of an HMM. We then argue that combined with the knockoffs selective
framework, they provide a natural and powerful tool for performing principled
inference in genome-wide association studies with guaranteed FDR control.
Finally, we apply our methodology to several datasets aimed at studying the
Crohn's disease and several continuous phenotypes, e.g. levels of cholesterol.Comment: 35 pages, 13 figues, 9 table