8,245 research outputs found
Subsampling Mathematical Relaxations and Average-case Complexity
We initiate a study of when the value of mathematical relaxations such as
linear and semidefinite programs for constraint satisfaction problems (CSPs) is
approximately preserved when restricting the instance to a sub-instance induced
by a small random subsample of the variables. Let be a family of CSPs such
as 3SAT, Max-Cut, etc., and let be a relaxation for , in the sense
that for every instance , is an upper bound the maximum
fraction of satisfiable constraints of . Loosely speaking, we say that
subsampling holds for and if for every sufficiently dense instance and every , if we let be the instance obtained by
restricting to a sufficiently large constant number of variables, then
. We say that weak subsampling holds if the
above guarantee is replaced with whenever
. We show: 1. Subsampling holds for the BasicLP and BasicSDP
programs. BasicSDP is a variant of the relaxation considered by Raghavendra
(2008), who showed it gives an optimal approximation factor for every CSP under
the unique games conjecture. BasicLP is the linear programming analog of
BasicSDP. 2. For tighter versions of BasicSDP obtained by adding additional
constraints from the Lasserre hierarchy, weak subsampling holds for CSPs of
unique games type. 3. There are non-unique CSPs for which even weak subsampling
fails for the above tighter semidefinite programs. Also there are unique CSPs
for which subsampling fails for the Sherali-Adams linear programming hierarchy.
As a corollary of our weak subsampling for strong semidefinite programs, we
obtain a polynomial-time algorithm to certify that random geometric graphs (of
the type considered by Feige and Schechtman, 2002) of max-cut value
have a cut value at most .Comment: Includes several more general results that subsume the previous
version of the paper
Stability
Reproducibility is imperative for any scientific discovery. More often than
not, modern scientific findings rely on statistical analysis of
high-dimensional data. At a minimum, reproducibility manifests itself in
stability of statistical results relative to "reasonable" perturbations to data
and to the model used. Jacknife, bootstrap, and cross-validation are based on
perturbations to data, while robust statistics methods deal with perturbations
to models. In this article, a case is made for the importance of stability in
statistics. Firstly, we motivate the necessity of stability for interpretable
and reliable encoding models from brain fMRI signals. Secondly, we find strong
evidence in the literature to demonstrate the central role of stability in
statistical inference, such as sensitivity analysis and effect detection.
Thirdly, a smoothing parameter selector based on estimation stability (ES),
ES-CV, is proposed for Lasso, in order to bring stability to bear on
cross-validation (CV). ES-CV is then utilized in the encoding models to reduce
the number of predictors by 60% with almost no loss (1.3%) of prediction
performance across over 2,000 voxels. Last, a novel "stability" argument is
seen to drive new results that shed light on the intriguing interactions
between sample to sample variability and heavier tail error distribution (e.g.,
double-exponential) in high-dimensional regression models with predictors
and independent samples. In particular, when
and the error distribution is
double-exponential, the Ordinary Least Squares (OLS) is a better estimator than
the Least Absolute Deviation (LAD) estimator.Comment: Published in at http://dx.doi.org/10.3150/13-BEJSP14 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
- …