26 research outputs found
Optimal Testing for Planted Satisfiability Problems
We study the problem of detecting planted solutions in a random
satisfiability formula. Adopting the formalism of hypothesis testing in
statistical analysis, we describe the minimax optimal rates of detection. Our
analysis relies on the study of the number of satisfying assignments, for which
we prove new results. We also address algorithmic issues, and give a
computationally efficient test with optimal statistical performance. This
result is compared to an average-case hypothesis on the hardness of refuting
satisfiability of random formulas
Reweighted belief propagation and quiet planting for random K-SAT
We study the random K-satisfiability problem using a partition function where
each solution is reweighted according to the number of variables that satisfy
every clause. We apply belief propagation and the related cavity method to the
reweighted partition function. This allows us to obtain several new results on
the properties of random K-satisfiability problem. In particular the
reweighting allows to introduce a planted ensemble that generates instances
that are, in some region of parameters, equivalent to random instances. We are
hence able to generate at the same time a typical random SAT instance and one
of its solutions. We study the relation between clustering and belief
propagation fixed points and we give a direct evidence for the existence of
purely entropic (rather than energetic) barriers between clusters in some
region of parameters in the random K-satisfiability problem. We exhibit, in
some large planted instances, solutions with a non-trivial whitening core; such
solutions were known to exist but were so far never found on very large
instances. Finally, we discuss algorithmic hardness of such planted instances
and we determine a region of parameters in which planting leads to satisfiable
benchmarks that, up to our knowledge, are the hardest known.Comment: 23 pages, 4 figures, revised for readability, stability expression
correcte
Hide and Seek: Scaling Machine Learning for Combinatorial Optimization via the Probabilistic Method
Applying deep learning to solve real-life instances of hard combinatorial
problems has tremendous potential. Research in this direction has focused on
the Boolean satisfiability (SAT) problem, both because of its theoretical
centrality and practical importance. A major roadblock faced, though, is that
training sets are restricted to random formulas of size several orders of
magnitude smaller than formulas of practical interest, raising serious concerns
about generalization. This is because labeling random formulas of increasing
size rapidly becomes intractable. By exploiting the probabilistic method in a
fundamental way, we remove this roadblock entirely: we show how to generate
correctly labeled random formulas of any desired size, without having to solve
the underlying decision problem. Moreover, the difficulty of the classification
task for the formulas produced by our generator is tunable by varying a simple
scalar parameter. This opens up an entirely new level of sophistication for the
machine learning methods that can be brought to bear on Satisfiability. Using
our generator, we train existing state-of-the-art models for the task of
predicting satisfiability on formulas with 10,000 variables. We find that they
do no better than random guessing. As a first indication of what can be
achieved with the new generator, we present a novel classifier that performs
significantly better than random guessing 99% on the same datasets, for most
difficulty levels. Crucially, unlike past approaches that learn based on
syntactic features of a formula, our classifier performs its learning on a
short prefix of a solver's computation, an approach that we expect to be of
independent interest
Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's
We present an algorithm for recovering planted solutions in two well-known
models, the stochastic block model and planted constraint satisfaction
problems, via a common generalization in terms of random bipartite graphs. Our
algorithm matches up to a constant factor the best-known bounds for the number
of edges (or constraints) needed for perfect recovery and its running time is
linear in the number of edges used. The time complexity is significantly better
than both spectral and SDP-based approaches.
The main contribution of the algorithm is in the case of unequal sizes in the
bipartition (corresponding to odd uniformity in the CSP). Here our algorithm
succeeds at a significantly lower density than the spectral approaches,
surpassing a barrier based on the spectral norm of a random matrix.
Other significant features of the algorithm and analysis include (i) the
critical use of power iteration with subsampling, which might be of independent
interest; its analysis requires keeping track of multiple norms of an evolving
solution (ii) it can be implemented statistically, i.e., with very limited
access to the input distribution (iii) the algorithm is extremely simple to
implement and runs in linear time, and thus is practical even for very large
instances
NegDL: Privacy-Preserving Deep Learning Based on Negative Database
In the era of big data, deep learning has become an increasingly popular
topic. It has outstanding achievements in the fields of image recognition,
object detection, and natural language processing et al. The first priority of
deep learning is exploiting valuable information from a large amount of data,
which will inevitably induce privacy issues that are worthy of attention.
Presently, several privacy-preserving deep learning methods have been proposed,
but most of them suffer from a non-negligible degradation of either efficiency
or accuracy. Negative database (\textit{NDB}) is a new type of data
representation which can protect data privacy by storing and utilizing the
complementary form of original data. In this paper, we propose a
privacy-preserving deep learning method named NegDL based on \textit{NDB}.
Specifically, private data are first converted to \textit{NDB} as the input of
deep learning models by a generation algorithm called \textit{QK}-hidden
algorithm, and then the sketches of \textit{NDB} are extracted for training and
inference. We demonstrate that the computational complexity of NegDL is the
same as the original deep learning model without privacy protection.
Experimental results on Breast Cancer, MNIST, and CIFAR-10 benchmark datasets
demonstrate that the accuracy of NegDL could be comparable to the original deep
learning model in most cases, and it performs better than the method based on
differential privacy
On the Complexity of Random Satisfiability Problems with Planted Solutions
The problem of identifying a planted assignment given a random -SAT
formula consistent with the assignment exhibits a large algorithmic gap: while
the planted solution becomes unique and can be identified given a formula with
clauses, there are distributions over clauses for which the best
known efficient algorithms require clauses. We propose and study a
unified model for planted -SAT, which captures well-known special cases. An
instance is described by a planted assignment and a distribution on
clauses with literals. We define its distribution complexity as the largest
for which the distribution is not -wise independent ( for
any distribution with a planted assignment).
Our main result is an unconditional lower bound, tight up to logarithmic
factors, for statistical (query) algorithms [Kearns 1998, Feldman et. al 2012],
matching known upper bounds, which, as we show, can be implemented using a
statistical algorithm. Since known approaches for problems over distributions
have statistical analogues (spectral, MCMC, gradient-based, convex optimization
etc.), this lower bound provides a rigorous explanation of the observed
algorithmic gap. The proof introduces a new general technique for the analysis
of statistical query algorithms. It also points to a geometric paring
phenomenon in the space of all planted assignments.
We describe consequences of our lower bounds to Feige's refutation hypothesis
[Feige 2002] and to lower bounds on general convex programs that solve planted
-SAT. Our bounds also extend to other planted -CSP models, and, in
particular, provide concrete evidence for the security of Goldreich's one-way
function and the associated pseudorandom generator when used with a
sufficiently hard predicate [Goldreich 2000].Comment: Extended abstract appeared in STOC 201