25 research outputs found
Excuse me, sir? Your language model is leaking (information)
We introduce a cryptographic method to hide an arbitrary secret payload in
the response of a Large Language Model (LLM). A secret key is required to
extract the payload from the model's response, and without the key it is
provably impossible to distinguish between the responses of the original LLM
and the LLM that hides a payload. In particular, the quality of generated text
is not affected by the payload. Our approach extends a recent result of Christ,
Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for
LLMs
Algorithmic Applications of Hypergraph and Partition Containers
We present a general method to convert algorithms into faster algorithms for
almost-regular input instances. Informally, an almost-regular input is an input
in which the maximum degree is larger than the average degree by at most a
constant factor. This family of inputs vastly generalizes several families of
inputs for which we commonly have improved algorithms, including bounded-degree
inputs and random inputs. It also generalizes families of inputs for which we
don't usually have faster algorithms, including regular-inputs of arbitrarily
high degree and very dense inputs. We apply our method to achieve breakthroughs
in exact algorithms for several central NP-Complete problems including -SAT,
Graph Coloring, and Maximum Independent Set.
Our main tool is the first algorithmic application of the relatively new
Hypergraph Container Method (Saxton and Thomason 2015, Balogh, Morris and
Samotij 2015). This recent breakthrough, which generalizes an earlier version
for graphs (Kleitman and Winston 1982, Sapozhenko 2001), has been used
extensively in recent years in extremal combinatorics. An important component
of our work is the generalization of (hyper-)graph containers to Partition
Containers
Motion Planning for Unlabeled Discs with Optimality Guarantees
We study the problem of path planning for unlabeled (indistinguishable)
unit-disc robots in a planar environment cluttered with polygonal obstacles. We
introduce an algorithm which minimizes the total path length, i.e., the sum of
lengths of the individual paths. Our algorithm is guaranteed to find a solution
if one exists, or report that none exists otherwise. It runs in time
, where is the number of robots and is the total
complexity of the workspace. Moreover, the total length of the returned
solution is at most , where OPT is the optimal solution cost. To
the best of our knowledge this is the first algorithm for the problem that has
such guarantees. The algorithm has been implemented in an exact manner and we
present experimental results that attest to its efficiency
Dynamic Ordered Sets with Approximate Queries, Approximate Heaps and Soft Heaps
We consider word RAM data structures for maintaining ordered sets of integers whose select and rank operations are allowed to return approximate results, i.e., ranks, or items whose rank, differ by less than Delta from the exact answer, where Delta=Delta(n) is an error parameter. Related to approximate select and rank is approximate (one-dimensional) nearest-neighbor. A special case of approximate select queries are approximate min queries. Data structures that support approximate min operations are known as approximate heaps (priority queues). Related to approximate heaps are soft heaps, which are approximate heaps with a different notion of approximation.
We prove the optimality of all the data structures presented, either through matching cell-probe lower bounds, or through equivalences to well studied static problems. For approximate select, rank, and nearest-neighbor operations we get matching cell-probe lower bounds. We prove an equivalence between approximate min operations, i.e., approximate heaps, and the static partitioning problem. Finally, we prove an equivalence between soft heaps and the classical sorting problem, on a smaller number of items.
Our results have many interesting and unexpected consequences. It turns out that approximation greatly speeds up some of these operations, while others are almost unaffected. In particular, while select and rank have identical operation times, both in comparison-based and word RAM implementations, an interesting separation emerges between the approximate versions of these operations in the word RAM model. Approximate select is much faster than approximate rank. It also turns out that approximate min is exponentially faster than the more general approximate select. Next, we show that implementing soft heaps is harder than implementing approximate heaps. The relation between them corresponds to the relation between sorting and partitioning.
Finally, as an interesting byproduct, we observe that a combination of known techniques yields a deterministic word RAM algorithm for (exactly) sorting n items in O(n log log_w n) time, where w is the word length. Even for the easier problem of finding duplicates, the best previous deterministic bound was O(min{n log log n,n log_w n}). Our new unifying bound is an improvement when w is sufficiently large compared with n
Selection from Heaps, Row-Sorted Matrices, and X+Y Using Soft Heaps
We use soft heaps to obtain simpler optimal algorithms for selecting the k-th smallest item, and the set of k smallest items, from a heap-ordered tree, from a collection of sorted lists, and from X+Y, where X and Y are two unsorted sets. Our results match, and in some ways extend and improve, classical results of Frederickson (1993) and Frederickson and Johnson (1982). In particular, for selecting the k-th smallest item, or the set of k smallest items, from a collection of m sorted lists we obtain a new optimal "output-sensitive" algorithm that performs only O(m + sum_{i=1}^m log(k_i+1)) comparisons, where k_i is the number of items of the i-th list that belong to the overall set of k smallest items
Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering
Random dimensionality reduction is a versatile tool for speeding up
algorithms for high-dimensional problems. We study its application to two
clustering problems: the facility location problem, and the single-linkage
hierarchical clustering problem, which is equivalent to computing the minimum
spanning tree. We show that if we project the input pointset onto a random
-dimensional subspace (where is the doubling dimension of
), then the optimum facility location cost in the projected space
approximates the original cost up to a constant factor. We show an analogous
statement for minimum spanning tree, but with the dimension having an extra
term and the approximation factor being arbitrarily close to .
Furthermore, we extend these results to approximating solutions instead of just
their costs. Lastly, we provide experimental results to validate the quality of
solutions and the speedup due to the dimensionality reduction. Unlike several
previous papers studying this approach in the context of -means and
-medians, our dimension bound does not depend on the number of clusters but
only on the intrinsic dimensionality of .Comment: 25 pages. Published as a conference paper in ICML 202
Bottleneck Paths and Trees and Deterministic Graphical Games
Gabow and Tarjan showed that the Bottleneck Path (BP) problem, i.e., finding a path between a given source and a given target in a weighted directed graph whose largest edge weight is minimized, as well as the Bottleneck spanning tree (BST) problem, i.e., finding a directed spanning tree rooted at a given vertex whose largest edge weight is minimized, can both be solved deterministically in O(m * log^*(n)) time, where m is the number of edges and n is the number of vertices in the graph. We present a slightly improved randomized algorithm for these problems with an expected running time of O(m * beta(m,n)), where beta(m,n) = min{k >= 1 | log^{(k)}n = n * log^{(k)} * n, for some constant k, the expected running time of the new algorithm is O(m). Our algorithm, as that of Gabow and Tarjan, work in the comparison model. We also observe that in the word-RAM model, both problems can be solved deterministically in O(m) time. Finally, we solve an open problem of Andersson et al., giving a deterministic O(m)-time comparison-based algorithm for solving deterministic 2-player turn-based zero-sum terminal payoff games, also known as Deterministic Graphical Games (DGG)