5 research outputs found
STACS 2001 : 18th annual Symposium on Theoretical Aspects of Computer Science, Dresden, Germany, February 15-17, 2001 : proceedings
xv, 576 p. : ill. ; 24 c
Strongly Refuting Random CSPs Below the Spectral Threshold
Random constraint satisfaction problems (CSPs) are known to exhibit threshold
phenomena: given a uniformly random instance of a CSP with variables and
clauses, there is a value of beyond which the CSP will be
unsatisfiable with high probability. Strong refutation is the problem of
certifying that no variable assignment satisfies more than a constant fraction
of clauses; this is the natural algorithmic problem in the unsatisfiable regime
(when ).
Intuitively, strong refutation should become easier as the clause density
grows, because the contradictions introduced by the random clauses become
more locally apparent. For CSPs such as -SAT and -XOR, there is a
long-standing gap between the clause density at which efficient strong
refutation algorithms are known, , and the
clause density at which instances become unsatisfiable with high probability,
.
In this paper, we give spectral and sum-of-squares algorithms for strongly
refuting random -XOR instances with clause density in time or in
rounds of the sum-of-squares hierarchy, for any
and any integer . Our algorithms provide a smooth
transition between the clause density at which polynomial-time algorithms are
known at , and brute-force refutation at the satisfiability
threshold when . We also leverage our -XOR results to obtain
strong refutation algorithms for SAT (or any other Boolean CSP) at similar
clause densities. Our algorithms match the known sum-of-squares lower bounds
due to Grigoriev and Schonebeck, up to logarithmic factors.
Additionally, we extend our techniques to give new results for certifying
upper bounds on the injective tensor norm of random tensors
ENGINEERING COMPRESSED STATIC FUNCTIONS AND MINIMAL PERFECT HASH FUNCTIONS
\emph{Static functions} are data structures meant to store arbitrary mappings from finite sets to integers; that is, given universe of items , a set of pairs where , and , a static function will retrieve given (usually, in constant time). When every key is mapped into a different value this function is called \emph{perfect hash function} and when the data structure yields an injective numbering ; this mapping is called a \emph{minimal perfect hash function}. Big data brought back one of the most critical challenges that computer scientists have been tackling during the last fifty years, that is, analyzing big amounts of data that do not fit in main memory. While for small keysets these mappings can be easily implemented using hash tables, this solution does not scale well for bigger sets. Static functions and MPHFs break the information-theoretical lower bound of storing the set because they are allowed to return \emph{any} value if the queried key is not in the original keyset. The classical constructions technique for static functions can achieve just bits space, where , and the one for MPHFs bits of space (always with constant access time). All these features make static functions and MPHFs powerful techniques when handling, for instance, large sets of strings, and they are essential building blocks of space-efficient data structures such as (compressed) full-text indexes, monotone MPHFs, Bloom filter-like data structures, and prefix-search data structures. The biggest challenge of this construction technique involves lowering the multiplicative constants hidden inside the asymptotic space bounds while keeping feasible construction times. In this thesis, we take advantage of the recent result in random linear systems theory regarding the ratio between the number of variables and number of the equations, and in perfect hash data structures, to achieve practical static functions with the lowest space bounds so far, and construction time comparable with widely used techniques. The new results, however, require solving linear systems that require more than a simple triangulation process, as it happens in current state-of-the-art solutions. The main challenge in making such structures usable is mitigating the cubic running time of Gaussian elimination at construction time. To this purpose, we introduce novel techniques based on \emph{broadword programming} and a heuristic derived from \emph{structured Gaussian elimination}. We obtained data structures that are significantly smaller than commonly used hypergraph-based constructions while maintaining or improving the lookup times and providing still feasible construction.We then apply these improvements to another kind of structures: \emph{compressed static hash functions}. The theoretical construction technique for this kind of data structure uses prefix-free codes with variable length to encode the set of values. Adopting this solution, we can reduce the\n space usage of each element to (essentially) the entropy of the list of output values of the function.Indeed, we need to solve an even bigger linear system of equations, and the time required to build the structure increases. In this thesis, we present the first engineered implementation of compressed hash functions. For example, we were able to store a function with geometrically distributed output, with parameter in just bit per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires bits per key, where is the number of keys, and similar lookup time. We can also store a function with an output distributed following a Zipfian distribution with parameter and in just bits per key, whereas a non-compressed function would require more than , with a threefold increase in construction time and significantly faster lookups