    Understanding Space in Proof Complexity: Separations and Trade-offs via Substitutions

    For current state-of-the-art DPLL SAT-solvers the two main bottlenecks are the amounts of time and memory used. In proof complexity, these resources correspond to the length and space of resolution proofs. There has been a long line of research investigating these proof complexity measures, but while strong results have been established for length, our understanding of space and how it relates to length has remained quite poor. In particular, the question whether resolution proofs can be optimized for length and space simultaneously, or whether there are trade-offs between these two measures, has remained essentially open. In this paper, we remedy this situation by proving a host of length-space trade-off results for resolution. Our collection of trade-offs cover almost the whole range of values for the space complexity of formulas, and most of the trade-offs are superpolynomial or even exponential and essentially tight. Using similar techniques, we show that these trade-offs in fact extend to the exponentially stronger k-DNF resolution proof systems, which operate with formulas in disjunctive normal form with terms of bounded arity k. We also answer the open question whether the k-DNF resolution systems form a strict hierarchy with respect to space in the affirmative. Our key technical contribution is the following, somewhat surprising, theorem: Any CNF formula F can be transformed by simple variable substitution into a new formula F' such that if F has the right properties, F' can be proven in essentially the same length as F, whereas on the other hand the minimal number of lines one needs to keep in memory simultaneously in any proof of F' is lower-bounded by the minimal number of variables needed simultaneously in any proof of F. Applying this theorem to so-called pebbling formulas defined in terms of pebble games on directed acyclic graphs, we obtain our results.Comment: This paper is a merged and updated version of the two ECCC technical reports TR09-034 and TR09-047, and it hence subsumes these two report

    Understanding space in resolution: optimal lower bounds and exponential trade-offs

    We continue the study of tradeoffs between space and length of resolution proofs and focus on two new results: begin{enumerate} item We show that length and space in resolution are uncorrelated. This is proved by exhibiting families of CNF formulas of size O(n)O(n) that have proofs of length O(n)O(n) but require space Omega(n/logn)Omega(n / log n). Our separation is the strongest possible since any proof of length O(n)O(n) can always be transformed into a proof in space O(n/logn)O(n / log n), and improves previous work reported in [Nordstr"{o}m 2006, Nordstr"{o}m and H{aa}stad 2008]. item We prove a number of trade-off results for space in the range from constant to O(n/logn)O(n / log n), most of them superpolynomial or even exponential. This is a dramatic improvement over previous results in [Ben-Sasson 2002, Hertel and Pitassi 2007, Nordstr"{o}m 2007]. end{enumerate} The key to our results is the following, somewhat surprising, theorem: Any CNF formula FF can be transformed by simple substitution transformation into a new formula F2˘7F\u27 such that if FF has the right properties, F2˘7F\u27 can be proven in resolution in essentially the same length as FF but the minimal space needed for F2˘7F\u27 is lower-bounded by the number of variables that have to be mentioned simultaneously in any proof for FF. Applying this theorem to so-called pebbling formulas defined in terms of pebble games over directed acyclic graphs and analyzing black-white pebbling on these graphs yields our results

    Subspace Polynomials and Cyclic Subspace Codes

    Subspace codes have received an increasing interest recently due to their application in error-correction for random network coding. In particular, cyclic subspace codes are possible candidates for large codes with efficient encoding and decoding algorithms. In this paper we consider such cyclic codes and provide constructions of optimal codes for which their codewords do not have full orbits. We further introduce a new way to represent subspace codes by a class of polynomials called subspace polynomials. We present some constructions of such codes which are cyclic and analyze their parameters

    Sampling-based proofs of almost-periodicity results and algorithmic applications

    We give new combinatorial proofs of known almost-periodicity results for sumsets of sets with small doubling in the spirit of Croot and Sisask, whose almost-periodicity lemma has had far-reaching implications in additive combinatorics. We provide an alternative (and L^p-norm free) point of view, which allows for proofs to easily be converted to probabilistic algorithms that decide membership in almost-periodic sumsets of dense subsets of F_2^n. As an application, we give a new algorithmic version of the quasipolynomial Bogolyubov-Ruzsa lemma recently proved by Sanders. Together with the results by the last two authors, this implies an algorithmic version of the quadratic Goldreich-Levin theorem in which the number of terms in the quadratic Fourier decomposition of a given function is quasipolynomial in the error parameter, compared with an exponential dependence previously proved by the authors. It also improves the running time of the algorithm to have quasipolynomial dependence instead of an exponential one. We also give an application to the problem of finding large subspaces in sumsets of dense sets. Green showed that the sumset of a dense subset of F_2^n contains a large subspace. Using Fourier analytic methods, Sanders proved that such a subspace must have dimension bounded below by a constant times the density times n. We provide an alternative (and L^p norm-free) proof of a comparable bound, which is analogous to a recent result of Croot, Laba and Sisask in the integers.Comment: 28 page

    Short PCPs with projection queries

    We construct a PCP for NTIME(2 n) with constant soundness, 2 n poly(n) proof length, and poly(n) queries where the verifier’s computation is simple: the queries are a projection of the input randomness, and the computation on the prover’s answers is a 3CNF. The previous upper bound for these two computations was polynomial-size circuits. Composing this verifier with a proof oracle increases the circuit-depth of the latter by 2. Our PCP is a simple variant of the PCP by Ben-Sasson, Goldreich, Harsha, Sudan, and Vadhan (CCC 2005). We also give a more modular exposition of the latter, separating the combinatorial from the algebraic arguments. If our PCP is taken as a black box, we obtain a more direct proof of the result by Williams, later with Santhanam (CCC 2013) that derandomizing circuits on n bits from a class C in time 2 n /n ω(1) yields that NEXP is not in a related circuit class C ′. Our proof yields a tighter connection: C is an And-Or of circuits from C ′. Along the way we show that the same lower bound follows if the satisfiability of the And of any 3 circuits from C ′ can be solved in time 2 n /n ω(1). ∗The research leading to these results has received funding from the European Community’

    The Complexity of User Retention

    This paper studies families of distributions T that are amenable to retentive learning, meaning that an expert can retain users that seek to predict their future, assuming user attributes are sampled from T and exposed gradually over time. Limited attention span is the main problem experts face in our model. We make two contributions. First, we formally define the notions of retentively learnable distributions and properties. Along the way, we define a retention complexity measure of distributions and a natural class of retentive scoring rules that model the way users evaluate experts they interact with. These rules are shown to be tightly connected to truth-eliciting "proper scoring rules" studied in Decision Theory since the 1950\u27s [McCarthy, PNAS 1956]. Second, we take a first step towards relating retention complexity to other measures of significance in computational complexity. In particular, we show that linear properties (over the binary field) are retentively learnable, whereas random Low Density Parity Check (LDPC) codes have, with high probability, maximal retention complexity. Intriguingly, these results resemble known results from the field of property testing and suggest that deeper connections between retentive distributions and locally testable properties may exist

    Brief Announcement: Towards an Abstract Model of User Retention Dynamics

    A theoretical model is suggested for abstracting the interaction between an expert system and its users, with a focus on reputation and incentive compatibility. The model assumes users interact with the system while keeping in mind a single "retention parameter" that measures the strength of their belief in its predictive power, and the system\u27s objective is to reinforce and maximize this parameter through "informative" and "correct" predictions. We define a natural class of retentive scoring rules to model the way users update their retention parameter and thus evaluate the experts they interact with. Assuming agents in the model have an incentive to report their true belief, these rules are shown to be tightly connected to truth-eliciting "proper scoring rules" studied in Decision Theory. The difference between users and experts is modeled by imposing different limits on their predictive abilities, characterized by a parameter called memory span. We prove the monotonicity theorem ("more knowledge is better"), which shows that experts with larger memory span retain better in expectation. Finally, we focus on the intrinsic properties of phenomena that are amenable to collaborative discovery with a an expert system. Assuming user types (or "identities") are sampled from a distribution D, the retention complexity of D is the minimal initial retention value (or "strength of faith") that a user must have before approaching the expert, in order for the expert to retain that user throughout the collaborative discovery, during which the user "discovers" his true "identity". We then take a first step towards relating retention complexity to other established computational complexity measures by studying retention dynamics when D is a uniform distribution over a linear space

    Testing formula satisfaction

    We study the query complexity of testing for properties defined by read once formulae, as instances of massively parametrized properties, and prove several testability and non-testability results. First we prove the testability of any property accepted by a Boolean read-once formula involving any bounded arity gates, with a number of queries exponential in \epsilon and independent of all other parameters. When the gates are limited to being monotone, we prove that there is an estimation algorithm, that outputs an approximation of the distance of the input from satisfying the property. For formulae only involving And/Or gates, we provide a more efficient test whose query complexity is only quasi-polynomial in \epsilon. On the other hand we show that such testability results do not hold in general for formulae over non-Boolean alphabets; specifically we construct a property defined by a read-once arity 2 (non-Boolean) formula over alphabets of size 4, such that any 1/4-test for it requires a number of queries depending on the formula size

    Sum of squares lower bounds for refuting any CSP

    Let P:{0,1}k{0,1}P:\{0,1\}^k \to \{0,1\} be a nontrivial kk-ary predicate. Consider a random instance of the constraint satisfaction problem CSP(P)\mathrm{CSP}(P) on nn variables with Δn\Delta n constraints, each being PP applied to kk randomly chosen literals. Provided the constraint density satisfies Δ1\Delta \gg 1, such an instance is unsatisfiable with high probability. The \emph{refutation} problem is to efficiently find a proof of unsatisfiability. We show that whenever the predicate PP supports a tt-\emph{wise uniform} probability distribution on its satisfying assignments, the sum of squares (SOS) algorithm of degree d=Θ(nΔ2/(t1)logΔ)d = \Theta(\frac{n}{\Delta^{2/(t-1)} \log \Delta}) (which runs in time nO(d)n^{O(d)}) \emph{cannot} refute a random instance of CSP(P)\mathrm{CSP}(P). In particular, the polynomial-time SOS algorithm requires Ω~(n(t+1)/2)\widetilde{\Omega}(n^{(t+1)/2}) constraints to refute random instances of CSP(P)(P) when PP supports a tt-wise uniform distribution on its satisfying assignments. Together with recent work of Lee et al. [LRS15], our result also implies that \emph{any} polynomial-size semidefinite programming relaxation for refutation requires at least Ω~(n(t+1)/2)\widetilde{\Omega}(n^{(t+1)/2}) constraints. Our results (which also extend with no change to CSPs over larger alphabets) subsume all previously known lower bounds for semialgebraic refutation of random CSPs. For every constraint predicate~PP, they give a three-way hardness tradeoff between the density of constraints, the SOS degree (hence running time), and the strength of the refutation. By recent algorithmic results of Allen et al. [AOW15] and Raghavendra et al. [RRS16], this full three-way tradeoff is \emph{tight}, up to lower-order factors.Comment: 39 pages, 1 figur