8,245 research outputs found

    Subsampling Mathematical Relaxations and Average-case Complexity

    Full text link
    We initiate a study of when the value of mathematical relaxations such as linear and semidefinite programs for constraint satisfaction problems (CSPs) is approximately preserved when restricting the instance to a sub-instance induced by a small random subsample of the variables. Let CC be a family of CSPs such as 3SAT, Max-Cut, etc., and let Π\Pi be a relaxation for CC, in the sense that for every instance P∈CP\in C, Π(P)\Pi(P) is an upper bound the maximum fraction of satisfiable constraints of PP. Loosely speaking, we say that subsampling holds for CC and Π\Pi if for every sufficiently dense instance P∈CP \in C and every ϵ>0\epsilon>0, if we let P′P' be the instance obtained by restricting PP to a sufficiently large constant number of variables, then Π(P′)∈(1±ϵ)Π(P)\Pi(P') \in (1\pm \epsilon)\Pi(P). We say that weak subsampling holds if the above guarantee is replaced with Π(P′)=1−Θ(γ)\Pi(P')=1-\Theta(\gamma) whenever Π(P)=1−γ\Pi(P)=1-\gamma. We show: 1. Subsampling holds for the BasicLP and BasicSDP programs. BasicSDP is a variant of the relaxation considered by Raghavendra (2008), who showed it gives an optimal approximation factor for every CSP under the unique games conjecture. BasicLP is the linear programming analog of BasicSDP. 2. For tighter versions of BasicSDP obtained by adding additional constraints from the Lasserre hierarchy, weak subsampling holds for CSPs of unique games type. 3. There are non-unique CSPs for which even weak subsampling fails for the above tighter semidefinite programs. Also there are unique CSPs for which subsampling fails for the Sherali-Adams linear programming hierarchy. As a corollary of our weak subsampling for strong semidefinite programs, we obtain a polynomial-time algorithm to certify that random geometric graphs (of the type considered by Feige and Schechtman, 2002) of max-cut value 1−γ1-\gamma have a cut value at most 1−γ/101-\gamma/10.Comment: Includes several more general results that subsume the previous version of the paper

    Stability

    Full text link
    Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this article, a case is made for the importance of stability in statistics. Firstly, we motivate the necessity of stability for interpretable and reliable encoding models from brain fMRI signals. Secondly, we find strong evidence in the literature to demonstrate the central role of stability in statistical inference, such as sensitivity analysis and effect detection. Thirdly, a smoothing parameter selector based on estimation stability (ES), ES-CV, is proposed for Lasso, in order to bring stability to bear on cross-validation (CV). ES-CV is then utilized in the encoding models to reduce the number of predictors by 60% with almost no loss (1.3%) of prediction performance across over 2,000 voxels. Last, a novel "stability" argument is seen to drive new results that shed light on the intriguing interactions between sample to sample variability and heavier tail error distribution (e.g., double-exponential) in high-dimensional regression models with pp predictors and nn independent samples. In particular, when p/n→κ∈(0.3,1)p/n\rightarrow\kappa\in(0.3,1) and the error distribution is double-exponential, the Ordinary Least Squares (OLS) is a better estimator than the Least Absolute Deviation (LAD) estimator.Comment: Published in at http://dx.doi.org/10.3150/13-BEJSP14 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
    • …
    corecore