3,739 research outputs found

    Pairs of SAT Assignment in Random Boolean Formulae

    Get PDF
    We investigate geometrical properties of the random K-satisfiability problem using the notion of x-satisfiability: a formula is x-satisfiable if there exist two SAT assignments differing in Nx variables. We show the existence of a sharp threshold for this property as a function of the clause density. For large enough K, we prove that there exists a region of clause density, below the satisfiability threshold, where the landscape of Hamming distances between SAT assignments experiences a gap: pairs of SAT-assignments exist at small x, and around x=1/2, but they donot exist at intermediate values of x. This result is consistent with the clustering scenario which is at the heart of the recent heuristic analysis of satisfiability using statistical physics analysis (the cavity method), and its algorithmic counterpart (the survey propagation algorithm). The method uses elementary probabilistic arguments (first and second moment methods), and might be useful in other problems of computational and physical interest where similar phenomena appear

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}n{±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an ε\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: \circ Upper bound: For every ff with decision tree size ss and every ε(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log(s/ε)log(1/ε))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. \circ Lower bound: For every ε(0,12)\varepsilon \in (0,\frac1{2}) and s2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΩ~(logs)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(logs/ε)s^{O(\sqrt{\log s}/\varepsilon)} and sΩ~(logs4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

    Formulas vs. Circuits for Small Distance Connectivity

    Full text link
    We give the first super-polynomial separation in the power of bounded-depth boolean formulas vs. circuits. Specifically, we consider the problem Distance k(n)k(n) Connectivity, which asks whether two specified nodes in a graph of size nn are connected by a path of length at most k(n)k(n). This problem is solvable (by the recursive doubling technique) on {\bf circuits} of depth O(logk)O(\log k) and size O(kn3)O(kn^3). In contrast, we show that solving this problem on {\bf formulas} of depth logn/(loglogn)O(1)\log n/(\log\log n)^{O(1)} requires size nΩ(logk)n^{\Omega(\log k)} for all k(n)loglognk(n) \leq \log\log n. As corollaries: (i) It follows that polynomial-size circuits for Distance k(n)k(n) Connectivity require depth Ω(logk)\Omega(\log k) for all k(n)loglognk(n) \leq \log\log n. This matches the upper bound from recursive doubling and improves a previous Ω(loglogk)\Omega(\log\log k) lower bound of Beame, Pitassi and Impagliazzo [BIP98]. (ii) We get a tight lower bound of sΩ(d)s^{\Omega(d)} on the size required to simulate size-ss depth-dd circuits by depth-dd formulas for all s(n)=nO(1)s(n) = n^{O(1)} and d(n)logloglognd(n) \leq \log\log\log n. No lower bound better than sΩ(1)s^{\Omega(1)} was previously known for any d(n)O(1)d(n) \nleq O(1). Our proof technique is centered on a new notion of pathset complexity, which roughly speaking measures the minimum cost of constructing a set of (partial) paths in a universe of size nn via the operations of union and relational join, subject to certain density constraints. Half of our proof shows that bounded-depth formulas solving Distance k(n)k(n) Connectivity imply upper bounds on pathset complexity. The other half is a combinatorial lower bound on pathset complexity
    corecore