153 research outputs found
Faster algorithms for 1-mappability of a sequence
In the k-mappability problem, we are given a string x of length n and
integers m and k, and we are asked to count, for each length-m factor y of x,
the number of other factors of length m of x that are at Hamming distance at
most k from y. We focus here on the version of the problem where k = 1. The
fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and
space O(n). We present two algorithms that require worst-case time O(mn) and
O(n log^2 n), respectively, and space O(n), thus greatly improving the state of
the art. Moreover, we present an algorithm that requires average-case time and
space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where
{\sigma} is the alphabet size
Succinct Partial Sums and Fenwick Trees
We consider the well-studied partial sums problem in succint space where one
is to maintain an array of n k-bit integers subject to updates such that
partial sums queries can be efficiently answered. We present two succint
versions of the Fenwick Tree - which is known for its simplicity and
practicality. Our results hold in the encoding model where one is allowed to
reuse the space from the input data. Our main result is the first that only
requires nk + o(n) bits of space while still supporting sum/update in O(log_b
n) / O(b log_b n) time where 2 <= b <= log^O(1) n. The second result shows how
optimal time for sum/update can be achieved while only slightly increasing the
space usage to nk + o(nk) bits. Beyond Fenwick Trees, the results are primarily
based on bit-packing and sampling - making them very practical - and they also
allow for simple optimal parallelization
Efficient Dynamic Approximate Distance Oracles for Vertex-Labeled Planar Graphs
Let be a graph where each vertex is associated with a label. A
Vertex-Labeled Approximate Distance Oracle is a data structure that, given a
vertex and a label , returns a -approximation of
the distance from to the closest vertex with label in . Such
an oracle is dynamic if it also supports label changes. In this paper we
present three different dynamic approximate vertex-labeled distance oracles for
planar graphs, all with polylogarithmic query and update times, and nearly
linear space requirements
Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search
The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a
general technique for constructing a data structure to answer approximate near
neighbor queries by using a distribution over locality-sensitive
hash functions that partition space. For a collection of points, after
preprocessing, the query time is dominated by evaluations
of hash functions from and hash table lookups and
distance computations where is determined by the
locality-sensitivity properties of . It follows from a recent
result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive
hash functions can be reduced to , leaving the query time to be
dominated by distance computations and
additional word-RAM operations. We state this result as a general framework and
provide a simpler analysis showing that the number of lookups and distance
computations closely match the Indyk-Motwani framework, making it a viable
replacement in practice. Using ideas from another locality-sensitive hashing
framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of
additional word-RAM operations to .Comment: 15 pages, 3 figure
Stable Noncrossing Matchings
Given a set of men represented by points lying on a line, and
women represented by points lying on another parallel line, with each
person having a list that ranks some people of opposite gender as his/her
acceptable partners in strict order of preference. In this problem, we want to
match people of opposite genders to satisfy people's preferences as well as
making the edges not crossing one another geometrically. A noncrossing blocking
pair w.r.t. a matching is a pair of a man and a woman such that
they are not matched with each other but prefer each other to their own
partners in , and the segment does not cross any edge in . A
weakly stable noncrossing matching (WSNM) is a noncrossing matching that does
not admit any noncrossing blocking pair. In this paper, we prove the existence
of a WSNM in any instance by developing an algorithm to find one in a
given instance.Comment: This paper has appeared at IWOCA 201
Dynamic Set Intersection
Consider the problem of maintaining a family of dynamic sets subject to
insertions, deletions, and set-intersection reporting queries: given , report every member of in any order. We show that in the word
RAM model, where is the word size, given a cap on the maximum size of
any set, we can support set intersection queries in
expected time, and updates in expected time. Using this algorithm
we can list all triangles of a graph in
expected time, where and
is the arboricity of . This improves a 30-year old triangle enumeration
algorithm of Chiba and Nishizeki running in time.
We provide an incremental data structure on that supports intersection
{\em witness} queries, where we only need to find {\em one} .
Both queries and insertions take O\paren{\sqrt \frac{N}{w/\log^2 w}} expected
time, where . Finally, we provide time/space tradeoffs for
the fully dynamic set intersection reporting problem. Using words of space,
each update costs expected time, each reporting query
costs expected time where
is the size of the output, and each witness query costs expected time.Comment: Accepted to WADS 201
Beyond Hypergraph Dualization
International audienceThis problem concerns hypergraph dualization and generalization to poset dualization. A hypergraph H = (V, E) consists of a finite collection E of sets over a finite set V , i.e. E ⊆ P(V) (the powerset of V). The elements of E are called hyperedges, or simply edges. A hypergraph is said simple if none of its edges is contained within another. A transversal (or hitting set) of H is a set T ⊆ V that intersects every edge of E. A transversal is minimal if it does not contain any other transversal as a subset. The set of all minimal transversal of H is denoted by T r(H). The hypergraph (V, T r(H)) is called the transversal hypergraph of H. Given a simple hypergraph H, the hypergraph dualization problem (Trans-Enum for short) concerns the enumeration without repetitions of T r(H). The Trans-Enum problem can also be formulated as a dualization problem in posets. Let (P, ≤) be a poset (i.e. ≤ is a reflexive, antisymmetric, and transitive relation on the set P). For A ⊆ P , ↓ A (resp. ↑ A) is the downward (resp. upward) closure of A under the relation ≤ (i.e. ↓ A is an ideal and ↑ A a filter of (P, ≤)). Two antichains (B + , B −) of P are said to be dual if ↓ B + ∪ ↑ B − = P and ↓ B + ∩ ↑ B − = ∅. Given an implicit description of a poset P and an antichain B + (resp. B −) of P , the poset dualization problem (Dual-Enum for short) enumerates the set B − (resp. B +), denoted by Dual(B +) = B − (resp. Dual(B −) = B +). Notice that the function dual is self-dual or idempotent, i.e. Dual(Dual(B)) = B
Algorithms for the minimum non-separating path and the balanced connected bipartition problems on grid graphs (With erratum)
For given a pair of nodes in a graph, the minimum non-separating path problem
looks for a minimum weight path between the two nodes such that the remaining
graph after removing the path is still connected. The balanced connected
bipartition (BCP) problem looks for a way to bipartition a graph into two
connected subgraphs with their weights as equal as possible. In this paper we
present an algorithm in time for finding a minimum weight
non-separating path between two given nodes in a grid graph of nodes with
positive weight. This result leads to a 5/4-approximation algorithm for the
BCP problem on grid graphs, which is the currently best ratio achieved in
polynomial time. We also developed an exact algorithm for the BCP problem
on grid graphs. Based on the exact algorithm and a rounding technique, we show
an approximation scheme, which is a fully polynomial time approximation scheme
for fixed number of rows.Comment: With erratu
Optimal skeleton huffman trees revisited
A skeleton Huffman tree is a Huffman tree in which all disjoint maximal perfect subtrees are shrunk into leaves. Skeleton Huffman trees, besides saving storage space, are also used for faster decoding and for speeding up Huffman-shaped wavelet trees. In 2017 Klein et al. introduced an optimal skeleton tree: for given symbol frequencies, it has the least number of nodes among all optimal prefix-free code trees (not necessarily Huffman’s) with shrunk perfect subtrees. Klein et al. described a simple algorithm that, for fixed codeword lengths, finds a skeleton tree with the least number of nodes; with this algorithm one can process each set of optimal codeword lengths to find an optimal skeleton tree. However, there are exponentially many such sets in the worst case. We describe an (formula presented)-time algorithm that, given n symbol frequencies, constructs an optimal skeleton tree and its corresponding optimal code. © Springer Nature Switzerland AG 2020.Supported by the Russian Science Foundation (RSF), project 18-71-00002
- …