104 research outputs found

    String Indexing with Compressed Patterns

    Get PDF
    Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern

    Differentially Private Approximate Pattern Matching

    Full text link
    In this paper, we consider the kk-approximate pattern matching problem under differential privacy, where the goal is to report or count all substrings of a given string SS which have a Hamming distance at most kk to a pattern PP, or decide whether such a substring exists. In our definition of privacy, individual positions of the string SS are protected. To be able to answer queries under differential privacy, we allow some slack on kk, i.e. we allow reporting or counting substrings of SS with a distance at most (1+γ)k+α(1+\gamma)k+\alpha to PP, for a multiplicative error γ\gamma and an additive error α\alpha. We analyze which values of α\alpha and γ\gamma are necessary or sufficient to solve the kk-approximate pattern matching problem while satisfying ϵ\epsilon-differential privacy. Let nn denote the length of SS. We give 1) an ϵ\epsilon-differentially private algorithm with an additive error of O(ϵ1logn)O(\epsilon^{-1}\log n) and no multiplicative error for the existence variant; 2) an ϵ\epsilon-differentially private algorithm with an additive error O(ϵ1max(k,logn)logn)O(\epsilon^{-1}\max(k,\log n)\cdot\log n) for the counting variant; 3) an ϵ\epsilon-differentially private algorithm with an additive error of O(ϵ1logn)O(\epsilon^{-1}\log n) and multiplicative error O(1)O(1) for the reporting variant for a special class of patterns. The error bounds hold with high probability. All of these algorithms return a witness, that is, if there exists a substring of SS with distance at most kk to PP, then the algorithm returns a substring of SS with distance at most (1+γ)k+α(1+\gamma)k+\alpha to PP. Further, we complement these results by a lower bound, showing that any algorithm for the existence variant which also returns a witness must have an additive error of Ω(ϵ1logn)\Omega(\epsilon^{-1}\log n) with constant probability.Comment: This is a full version of a paper accepted to ITCS 202

    Gapped Indexing for Consecutive Occurrences

    Get PDF
    The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P? and P? and a gap range [?, ?] we can quickly find the consecutive occurrences of P? and P? with distance in [?, ?], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use O?(n) space and query time O?(|P?|+|P?|+n^{2/3}) for existence and counting and O?(|P?|+|P?|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using O?(n) space must use ??(|P?| + |P?| + ?n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem

    String Indexing for Top-k Close Consecutive Occurrences

    Get PDF
    The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i,j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j-i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ? ? (0,1]. Our first result achieves O(nlog n) space and optimal query time of O(m+k), and our second result achieves linear space and query time O(m+k^{1+?}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees

    Private Counting of Distinct Elements in the Turnstile Model and Extensions

    Get PDF
    Privately counting distinct elements in a stream is a fundamental data analysis problem with many applications in machine learning. In the turnstile model, Jain et al. [NeurIPS2023] initiated the study of this problem parameterized by the maximum flippancy of any element, i.e., the number of times that the count of an element changes from 0 to above 0 or vice versa. They give an item-level (ε,δ)-differentially private algorithm whose additive error is tight with respect to that parameterization. In this work, we show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level (ε,δ)-differential privacy and item-level ε-differential privacy with regards to a different parameterization, namely the sum of all flippancies. Our second result is a bound which shows that for a large class of algorithms, including all existing differentially private algorithms for this problem, the lower bound from item-level differential privacy extends to event-level differential privacy. This partially answers an open question by Jain et al. [NeurIPS2023]

    Compressed Indexing for Consecutive Occurrences

    Get PDF
    The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern P. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns P? and P? and a range [a,b], and one must find all consecutive occurrences (q?,q?) of P? and P? such that q?-q? ? [a,b]. By their results, we cannot hope for a very efficient indexing structure for such queries, even if a = 0 is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors

    Continual Counting with Gradual Privacy Expiration

    Full text link
    Differential privacy with gradual expiration models the setting where data items arrive in a stream and at a given time tt the privacy loss guaranteed for a data item seen at time (td)(t-d) is ϵg(d)\epsilon g(d), where gg is a monotonically non-decreasing function. We study the fundamental continual (binary) counting\textit{continual (binary) counting} problem where each data item consists of a bit, and the algorithm needs to output at each time step the sum of all the bits streamed so far. For a stream of length TT and privacy without\textit{without} expiration continual counting is possible with maximum (over all time steps) additive error O(log2(T)/ε)O(\log^2(T)/\varepsilon) and the best known lower bound is Ω(log(T)/ε)\Omega(\log(T)/\varepsilon); closing this gap is a challenging open problem. We show that the situation is very different for privacy with gradual expiration by giving upper and lower bounds for a large set of expiration functions gg. Specifically, our algorithm achieves an additive error of O(log(T)/ϵ) O(\log(T)/\epsilon) for a large set of privacy expiration functions. We also give a lower bound that shows that if CC is the additive error of any ϵ\epsilon-DP algorithm for this problem, then the product of CC and the privacy expiration function after 2C2C steps must be Ω(log(T)/ϵ)\Omega(\log(T)/\epsilon). Our algorithm matches this lower bound as its additive error is O(log(T)/ϵ)O(\log(T)/\epsilon), even when g(2C)=O(1)g(2C) = O(1). Our empirical evaluation shows that we achieve a slowly growing privacy loss with significantly smaller empirical privacy loss for large values of dd than a natural baseline algorithm

    String Indexing for Top-kk Close Consecutive Occurrences

    Full text link
    The classic string indexing problem is to preprocess a string SS into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string PP, report all occurrences of PP within SS. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-kk close consecutive occurrences problem (SITCCO). Here, a consecutive occurrence is a pair (i,j)(i,j), i<ji < j, such that PP occurs at positions ii and jj in SS and there is no occurrence of PP between ii and jj, and their distance is defined as jij-i. Given a pattern PP and a parameter kk, the goal is to report the top-kk consecutive occurrences of PP in SS of minimal distance. The challenge is to compactly represent SS while supporting queries in time close to length of PP and kk. We give two time-space trade-offs for the problem. Let nn be the length of SS, mm the length of PP, and ϵ(0,1]\epsilon\in(0,1]. Our first result achieves O(nlogn)O(n\log n) space and optimal query time of O(m+k)O(m+k), and our second result achieves linear space and query time O(m+k1+ϵ)O(m+k^{1+\epsilon}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.Comment: Fixed typos, minor change

    Effect of natalizumab on disease progression in secondary progressive multiple sclerosis (ASCEND). a phase 3, randomised, double-blind, placebo-controlled trial with an open-label extension

    Get PDF
    Background: Although several disease-modifying treatments are available for relapsing multiple sclerosis, treatment effects have been more modest in progressive multiple sclerosis and have been observed particularly in actively relapsing subgroups or those with lesion activity on imaging. We sought to assess whether natalizumab slows disease progression in secondary progressive multiple sclerosis, independent of relapses. Methods: ASCEND was a phase 3, randomised, double-blind, placebo-controlled trial (part 1) with an optional 2 year open-label extension (part 2). Enrolled patients aged 18–58 years were natalizumab-naive and had secondary progressive multiple sclerosis for 2 years or more, disability progression unrelated to relapses in the previous year, and Expanded Disability Status Scale (EDSS) scores of 3·0–6·5. In part 1, patients from 163 sites in 17 countries were randomly assigned (1:1) to receive 300 mg intravenous natalizumab or placebo every 4 weeks for 2 years. Patients were stratified by site and by EDSS score (3·0–5·5 vs 6·0–6·5). Patients completing part 1 could enrol in part 2, in which all patients received natalizumab every 4 weeks until the end of the study. Throughout both parts, patients and staff were masked to the treatment received in part 1. The primary outcome in part 1 was the proportion of patients with sustained disability progression, assessed by one or more of three measures: the EDSS, Timed 25-Foot Walk (T25FW), and 9-Hole Peg Test (9HPT). The primary outcome in part 2 was the incidence of adverse events and serious adverse events. Efficacy and safety analyses were done in the intention-to-treat population. This trial is registered with ClinicalTrials.gov, number NCT01416181. Findings: Between Sept 13, 2011, and July 16, 2015, 889 patients were randomly assigned (n=440 to the natalizumab group, n=449 to the placebo group). In part 1, 195 (44%) of 439 natalizumab-treated patients and 214 (48%) of 448 placebo-treated patients had confirmed disability progression (odds ratio [OR] 0·86; 95% CI 0·66–1·13; p=0·287). No treatment effect was observed on the EDSS (OR 1·06, 95% CI 0·74–1·53; nominal p=0·753) or the T25FW (0·98, 0·74–1·30; nominal p=0·914) components of the primary outcome. However, natalizumab treatment reduced 9HPT progression (OR 0·56, 95% CI 0·40–0·80; nominal p=0·001). In part 1, 100 (22%) placebo-treated and 90 (20%) natalizumab-treated patients had serious adverse events. In part 2, 291 natalizumab-continuing patients and 274 natalizumab-naive patients received natalizumab (median follow-up 160 weeks [range 108–221]). Serious adverse events occurred in 39 (13%) patients continuing natalizumab and in 24 (9%) patients initiating natalizumab. Two deaths occurred in part 1, neither of which was considered related to study treatment. No progressive multifocal leukoencephalopathy occurred. Interpretation: Natalizumab treatment for secondary progressive multiple sclerosis did not reduce progression on the primary multicomponent disability endpoint in part 1, but it did reduce progression on its upper-limb component. Longer-term trials are needed to assess whether treatment of secondary progressive multiple sclerosis might produce benefits on additional disability components. Funding: Biogen

    Definition, aims, and implementation of GA2LEN/HAEi Angioedema Centers of Reference and Excellence

    Get PDF
    corecore