5 research outputs found

    Post hoc false discovery proportion inference under a Hidden Markov Model

    Get PDF
    We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (2009). While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this model, following a recent trend in selective inference, we consider the problem of establishing confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We develop a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we propose a bootstrap-based methodology to take into account the effect of parameter estimation error. We show that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples

    Multivariate statistical modelling for QTL detection and marker selection in a bi-parental grapevine population

    No full text
    Genetic selection in grapevine and many similar species is a challenge due to their perennial status and outbred nature. Marker-assisted selection and genomic selection can hence be useful methods to ease and speed up breeding. In Coupel-Ledru et al (2014), a 191 progeny of Syrah x Grenache was phenotyped in two successive years for several ecophysiological traits under two conditions, well-watered and water deficit, on a high-throughput phenotyping platform coupled to a controlled–environment chamber. As offsprings were previously genotyped at 153 SSR markers, several QTLs were found for each trait separately, differing across years and conditions.But do these differences reflect biological processes or contrasted power between conditions? And how accurately do they allow predicting phenotypes depending on the conditions? To answer such questions, our aim is to explore the ability of sparse and regularized multivariate regression models and algorithms to select QTLs based on their predictive properties. In the present study, we perform variable selection with various flavours of the LASSO method (group Lasso, fused Lasso) adapted for multiple responses, extending the model and algorithm from Chiquet et al. (2017). We apply these methods on simulated data and on real data fromCoupel-Ledru et al. (2014)
    corecore