870 research outputs found

    Familywise Error Rate Control via Knockoffs

    Get PDF
    We present a novel method for controlling the kk-familywise error rate (kk-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Cand\`es. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testing procedures which act directly on pp-values, knockoffs is specifically tailored to linear regression and implicitly accounts for the statistical relationships between hypothesis tests of different coefficients. We prove that knockoffs controls the kk-FWER exactly in finite samples and show in simulations that it provides superior power to alternative procedures over a range of linear regression problems. We also discuss extensions to controlling other Type I error rates such as the false exceedance rate, and use it to identify candidates for mutations conferring drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference

    A Methodology for Robust Multiproxy Paleoclimate Reconstructions and Modeling of Temperature Conditional Quantiles

    Full text link
    Great strides have been made in the field of reconstructing past temperatures based on models relating temperature to temperature-sensitive paleoclimate proxies. One of the goals of such reconstructions is to assess if current climate is anomalous in a millennial context. These regression based approaches model the conditional mean of the temperature distribution as a function of paleoclimate proxies (or vice versa). Some of the recent focus in the area has considered methods which help reduce the uncertainty inherent in such statistical paleoclimate reconstructions, with the ultimate goal of improving the confidence that can be attached to such endeavors. A second important scientific focus in the subject area is the area of forward models for proxies, the goal of which is to understand the way paleoclimate proxies are driven by temperature and other environmental variables. In this paper we introduce novel statistical methodology for (1) quantile regression with autoregressive residual structure, (2) estimation of corresponding model parameters, (3) development of a rigorous framework for specifying uncertainty estimates of quantities of interest, yielding (4) statistical byproducts that address the two scientific foci discussed above. Our statistical methodology demonstrably produces a more robust reconstruction than is possible by using conditional-mean-fitting methods. Our reconstruction shares some of the common features of past reconstructions, but also gains useful insights. More importantly, we are able to demonstrate a significantly smaller uncertainty than that from previous regression methods. In addition, the quantile regression component allows us to model, in a more complete and flexible way than least squares, the conditional distribution of temperature given proxies. This relationship can be used to inform forward models relating how proxies are driven by temperature

    Optimal Sampling-Based Motion Planning under Differential Constraints: the Driftless Case

    Full text link
    Motion planning under differential constraints is a classic problem in robotics. To date, the state of the art is represented by sampling-based techniques, with the Rapidly-exploring Random Tree algorithm as a leading example. Yet, the problem is still open in many aspects, including guarantees on the quality of the obtained solution. In this paper we provide a thorough theoretical framework to assess optimality guarantees of sampling-based algorithms for planning under differential constraints. We exploit this framework to design and analyze two novel sampling-based algorithms that are guaranteed to converge, as the number of samples increases, to an optimal solution (namely, the Differential Probabilistic RoadMap algorithm and the Differential Fast Marching Tree algorithm). Our focus is on driftless control-affine dynamical models, which accurately model a large class of robotic systems. In this paper we use the notion of convergence in probability (as opposed to convergence almost surely): the extra mathematical flexibility of this approach yields convergence rate bounds - a first in the field of optimal sampling-based motion planning under differential constraints. Numerical experiments corroborating our theoretical results are presented and discussed

    Optimal Sampling-Based Motion Planning under Differential Constraints: the Drift Case with Linear Affine Dynamics

    Full text link
    In this paper we provide a thorough, rigorous theoretical framework to assess optimality guarantees of sampling-based algorithms for drift control systems: systems that, loosely speaking, can not stop instantaneously due to momentum. We exploit this framework to design and analyze a sampling-based algorithm (the Differential Fast Marching Tree algorithm) that is asymptotically optimal, that is, it is guaranteed to converge, as the number of samples increases, to an optimal solution. In addition, our approach allows us to provide concrete bounds on the rate of this convergence. The focus of this paper is on mixed time/control energy cost functions and on linear affine dynamical systems, which encompass a range of models of interest to applications (e.g., double-integrators) and represent a necessary step to design, via successive linearization, sampling-based and provably-correct algorithms for non-linear drift control systems. Our analysis relies on an original perturbation analysis for two-point boundary value problems, which could be of independent interest

    Relaxing the Assumptions of Knockoffs by Conditioning

    Full text link
    The recent paper Cand\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as Ω(np)\Omega(n^{*}p) parameters, where pp is the dimension and nn^{*} is the number of covariate samples (which may exceed the usual sample size nn of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions
    corecore