3,739 research outputs found
Pairs of SAT Assignment in Random Boolean Formulae
We investigate geometrical properties of the random K-satisfiability problem
using the notion of x-satisfiability: a formula is x-satisfiable if there exist
two SAT assignments differing in Nx variables. We show the existence of a sharp
threshold for this property as a function of the clause density. For large
enough K, we prove that there exists a region of clause density, below the
satisfiability threshold, where the landscape of Hamming distances between SAT
assignments experiences a gap: pairs of SAT-assignments exist at small x, and
around x=1/2, but they donot exist at intermediate values of x. This result is
consistent with the clustering scenario which is at the heart of the recent
heuristic analysis of satisfiability using statistical physics analysis (the
cavity method), and its algorithmic counterpart (the survey propagation
algorithm). The method uses elementary probabilistic arguments (first and
second moment methods), and might be useful in other problems of computational
and physical interest where similar phenomena appear
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Formulas vs. Circuits for Small Distance Connectivity
We give the first super-polynomial separation in the power of bounded-depth
boolean formulas vs. circuits. Specifically, we consider the problem Distance
Connectivity, which asks whether two specified nodes in a graph of size
are connected by a path of length at most . This problem is solvable
(by the recursive doubling technique) on {\bf circuits} of depth
and size . In contrast, we show that solving this problem on {\bf
formulas} of depth requires size for all . As corollaries:
(i) It follows that polynomial-size circuits for Distance Connectivity
require depth for all . This matches the
upper bound from recursive doubling and improves a previous lower bound of Beame, Pitassi and Impagliazzo [BIP98].
(ii) We get a tight lower bound of on the size required to
simulate size- depth- circuits by depth- formulas for all and . No lower bound better than
was previously known for any .
Our proof technique is centered on a new notion of pathset complexity, which
roughly speaking measures the minimum cost of constructing a set of (partial)
paths in a universe of size via the operations of union and relational
join, subject to certain density constraints. Half of our proof shows that
bounded-depth formulas solving Distance Connectivity imply upper bounds
on pathset complexity. The other half is a combinatorial lower bound on pathset
complexity
- …