    Size-Constrained Weighted Ancestors with Applications

    The weighted ancestor problem on a rooted node-weighted tree T is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require Ω(log log n) time for queries provided (n poly log n) space is available, where n is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with (1) query time and an (n)-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function weight that maps each node of T to a positive integer, such that weight(u) ≤ size(u) for any node u and weight(u₁) ≤ weight(u₂) if node u₁ is a descendant of node u₂, where size(u) is the number of nodes in the subtree rooted at u. In the size-constrained weighted ancestor (SWA) problem, for any node u of T and any integer k, we are asked to return the lowest ancestor w of u with weight at least k. We show that for any rooted tree with n nodes, we can locate node w in (1) time after (n)-time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with (1) query time and (n)-time preprocessing, when the nodes are weighted by weight. We also show several string-processing applications of this result

    Dynamic L-Budget Clustering of Curves

    A key goal of clustering is data reduction. In center-based clustering of complex objects therefore not only the number of clusters but also the complexity of the centers plays a crucial role. We propose L-Budget Clustering as unifying perspective on this task, optimizing the clustering under the constraint that the summed complexity of all centers is at most L. We present algorithms for clustering planar curves under the Fréchet distance, but note that our algorithms more generally apply to objects in metric spaces if a notion of simplification of objects is applicable. A scenario in which data reduction is of particular importance is when the space is limited. Our main result is an efficient (8 + ε)-approximation algorithm with a (1 + ε)-resource augmentation that maintains an L-budget clustering under insertion of curves using only O(Lε^{-1}) space and O^*(L³log(L) + L²log(r^*/r₀)) time where O^* hides factors of ε^{-1}

    Search-Space Reduction via Essential Vertices Revisited: Vertex Multicut and Cograph Deletion

    For an optimization problem Π on graphs whose solutions are vertex sets, a vertex v is called c-essential for Π if all solutions of size at most c ⋅ opt contain v. Recent work showed that polynomial-time algorithms to detect c-essential vertices can be used to reduce the search space of fixed-parameter tractable algorithms solving such problems parameterized by the size k of the solution. We provide several new upper- and lower bounds for detecting essential vertices. For example, we give a polynomial-time algorithm for 3-Essential detection for Vertex Multicut, which translates into an algorithm that finds a minimum multicut of an undirected n-vertex graph G in time 2^(³)⋅n^(1), where is the number of vertices in an optimal solution that are not 3-essential. Our positive results are obtained by analyzing the integrality gaps of certain linear programs. Our lower bounds show that for sufficiently small values of c, the detection task becomes NP-hard assuming the Unique Games Conjecture. For example, we show that (2-ε)-Essential detection for Directed Feedback Vertex Set is NP-hard under this conjecture, thereby proving that the existing algorithm that detects 2-essential vertices is best-possible

    No-Dimensional Tverberg Partitions Revisited

    Given a set P ⊂ ℝ^d of n points, with diameter Δ, and a parameter δ ∈ (0,1), it is known that there is a partition of P into sets P_1, …, P_t, each of size O(1/δ²), such that their convex hulls all intersect a common ball of radius δΔ. We prove that a random partition, with a simple alteration step, yields the desired partition, resulting in a (randomized) linear time algorithm (i.e., O(dn)). We also provide a deterministic algorithm with running time O(dn log n). Previous proofs were either existential (i.e., at least exponential time), or required much bigger sets. In addition, the algorithm and its proof of correctness are significantly simpler than previous work, and the constants are slightly better. We also include a number of applications and extensions using the same central ideas. For example, we provide a linear time algorithm for computing a "fuzzy" centerpoint, and prove a no-dimensional weak ε-net theorem with an improved constant

    Privacy Can Arise Endogenously in an Economic System with Learning Agents

    We study price-discrimination games between buyers and a seller where privacy arises endogenously - that is, utility maximization yields equilibrium strategies where privacy occurs naturally. In this game, buyers with a high valuation for a good have an incentive to keep their valuation private, lest the seller charge them a higher price. This yields an equilibrium where some buyers will send a signal that misrepresents their type with some probability; we refer to this as buyer-induced privacy. When the seller is able to publicly commit to providing a certain privacy level, we find that their equilibrium response is to commit to ignore buyers' signals with some positive probability; we refer to this as seller-induced privacy. We then turn our attention to a repeated interaction setting where the game parameters are unknown and the seller cannot credibly commit to a level of seller-induced privacy. In this setting, players must learn strategies based on information revealed in past rounds. We find that, even without commitment ability, seller-induced privacy arises as a result of reputation building. We characterize the resulting seller-induced privacy and seller’s utility under no-regret and no-policy-regret learning algorithms and verify these results through simulations

    Grounding Stream Reasoning Research

    In the last decade, there has been a growing interest in applying AI technologies to implement complex data analytics over data streams. To this end, researchers in various fields have been organising a yearly event called the "Stream Reasoning Workshop" to share perspectives, challenges, and experiences around this topic. In this paper, the previous organisers of the workshops and other community members provide a summary of the main research results that have been discussed during the first six editions of the event. These results can be categorised into four main research areas: The first is concerned with the technological challenges related to handling large data streams. The second area aims at adapting and extending existing semantic technologies to data streams. The third and fourth areas focus on how to implement reasoning techniques, either considering deductive or inductive techniques, to extract new and valuable knowledge from the data in the stream. This summary is written not only to provide a crystallisation of the field, but also to point out distinctive traits of the stream reasoning community. Moreover, it also provides a foundation for future research by enumerating a list of use cases and open challenges, to stimulate others to join this exciting research area

    On the Size Overhead of Pairwise Spanners

    Given an undirected possibly weighted n-vertex graph G = (V,E) and a set ⊆ V² of pairs, a subgraph S = (V,E') is called a P-pairwise α-spanner of G, if for every pair (u,v) ∈ we have d_S(u,v) ≤ α⋅ d_G(u,v). The parameter α is called the stretch of the spanner, and its size overhead is define as |E'|/|P|. A surprising connection was recently discussed between the additive stretch of (1+ε,β)-spanners, to the hopbound of (1+ε,β)-hopsets. A long sequence of works showed that if the spanner/hopset has size ≈ n^{1+1/k} for some parameter k ≥ 1, then β≈(1/ε)^{log k}. In this paper we establish a new connection to the size overhead of pairwise spanners. In particular, we show that if |P|≈ n^{1+1/k}, then a P-pairwise (1+ε)-spanner must have size at least β⋅ |P| with β≈(1/ε)^{log k} (a near matching upper bound was recently shown in [Michael Elkin and Idan Shabat, 2023]). That is, the size overhead of pairwise spanners has similar bounds to the hopbound of hopsets, and to the additive stretch of spanners. We also extend the connection between pairwise spanners and hopsets to the large stretch regime, by showing nearly matching upper and lower bounds for P-pairwise α-spanners. In particular, we show that if |P|≈ n^{1+1/k}, then the size overhead is β≈k/α. A source-wise spanner is a special type of pairwise spanner, for which P = A×V for some A ⊆ V. A prioritized spanner is given also a ranking of the vertices V = (v₁,… ,v_n), and is required to provide improved stretch for pairs containing higher ranked vertices. By using a sequence of reductions: from pairwise spanners to source-wise spanners to prioritized spanners, we improve on the state-of-the-art results for source-wise and prioritized spanners. Since our spanners can be equipped with a path-reporting mechanism, we also substantially improve the known bounds for path-reporting prioritized distance oracles. Specifically, we provide a path-reporting distance oracle, with size O(n⋅(log log n)²), that has a constant stretch for any query that contains a vertex ranked among the first n^{1-δ} vertices (for any constant δ > 0). Such a result was known before only for non-path-reporting distance oracles

    Tight Bounds for Compressing Substring Samples

    We consider the problem of compressing a set of substrings sampled from a string and analyzing the size of the compression. Given a string S of length n, and integers d and m where n ≥ m ≥ 2d > 0, let SCS(S, m, d) be the string obtained by sequentially concatenating substrings of length m sampled regularly at intervals of d starting at position 1 in S. We consider the size of the LZ77 parsing of SCS(S, m, d), in relation to the size of the LZ77 parsing of S. This is motivated by genome sequencing, where the mentioned sampling process is an idealization of the short-read DNA sequencing. We show the following upper bound: |LZ77(SCS(S, m, d))| ≤ |LZ77(S)| + 2(n-m)/d. We also give a lower bound showing that this is tight. This improves previous results by Badkobeh et al. [ICTCS 2022], and closes the open problem of whether their bound can be improved. Another natural question is whether assuming that all letters in S are part of a sample, it is always the case that |LZ77(S)| ≤ |LZ77(SCS(S, m, d))|. Surprisingly, we show that there is a family of strings such that |LZ77(SCS(S, m, d))| = |LZ77(S)| - 1

    A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem

    The Burrows-Wheeler transform (BWT) is a famous text transformation that rearranges the symbols of the input strings so that occurrences of a same symbol tend to occur in runs. The number of runs is an important parameter in the BWT output string, historically associated with its high compressibility and more recently used as a measure for the space complexity of efficient data structures. It is a known fact that reordering the strings in the input collection affects the number of runs in the output string bwt() produced by applying the BWT to the string collection. In this paper, we define a class of transformed strings where symbols in particular blocks of the bwt() can be reordered according to a different adaptive alphabet order. Then, we introduce new heuristics to reduce the number of runs in the BWT output of a string collection that improve on the two existing heuristics introduced in Cox et al. [Anthony J. Cox et al., 2012]. These new heuristics are computed when applying the BWT to a string collection assuming no a priori order on the input strings and without requiring any pre- and/or post- processing of the collection or of the BWT string. In this paper, we also face the problem of reconstructing the input collection from the string bwt() together with the string permutation realized when applying an alphabetical reordering of symbols during the construction of bwt()

    Partial Temporal Vertex Cover with Bounded Activity Intervals

    Different variants of Vertex Cover have recently garnered attention in the context of temporal graphs. One of these variants is motivated by the need to summarize timeline activities in social networks. Here, the activities of individual vertices, representing users, are characterized by time intervals. In this paper, we explore a scenario where the temporal span of each vertex’s activity interval is bounded by an integer , and the objective is to maximize the number of (temporal) edges that are covered. We establish the APX-hardness of this problem and the NP-hardness of the corresponding decision problem, even under the restricted condition where the temporal domain comprises only two timestamps and each edge appears at most once. Subsequently, we delve into the parameterized complexity of the problem, offering two fixed-parameter algorithms parameterized by: (i) the number k of temporal edges covered by the solution, and (ii) the number h of temporal edges not covered by the solution. Finally, we present a polynomial-time approximation algorithm achieving a factor of 3/4


