104 research outputs found
String Indexing with Compressed Patterns
Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern
Differentially Private Approximate Pattern Matching
In this paper, we consider the -approximate pattern matching problem under
differential privacy, where the goal is to report or count all substrings of a
given string which have a Hamming distance at most to a pattern , or
decide whether such a substring exists. In our definition of privacy,
individual positions of the string are protected. To be able to answer
queries under differential privacy, we allow some slack on , i.e. we allow
reporting or counting substrings of with a distance at most
to , for a multiplicative error and an
additive error . We analyze which values of and are
necessary or sufficient to solve the -approximate pattern matching problem
while satisfying -differential privacy. Let denote the length of
. We give 1) an -differentially private algorithm with an additive
error of and no multiplicative error for the existence
variant; 2) an -differentially private algorithm with an additive
error for the counting variant; 3)
an -differentially private algorithm with an additive error of
and multiplicative error for the reporting
variant for a special class of patterns. The error bounds hold with high
probability. All of these algorithms return a witness, that is, if there exists
a substring of with distance at most to , then the algorithm returns
a substring of with distance at most to . Further,
we complement these results by a lower bound, showing that any algorithm for
the existence variant which also returns a witness must have an additive error
of with constant probability.Comment: This is a full version of a paper accepted to ITCS 202
Gapped Indexing for Consecutive Occurrences
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P? and P? and a gap range [?, ?] we can quickly find the consecutive occurrences of P? and P? with distance in [?, ?], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use O?(n) space and query time O?(|P?|+|P?|+n^{2/3}) for existence and counting and O?(|P?|+|P?|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using O?(n) space must use ??(|P?| + |P?| + ?n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem
String Indexing for Top-k Close Consecutive Occurrences
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i,j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j-i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ? ? (0,1]. Our first result achieves O(nlog n) space and optimal query time of O(m+k), and our second result achieves linear space and query time O(m+k^{1+?}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees
Private Counting of Distinct Elements in the Turnstile Model and Extensions
Privately counting distinct elements in a stream is a fundamental data analysis problem with many applications in machine learning. In the turnstile model, Jain et al. [NeurIPS2023] initiated the study of this problem parameterized by the maximum flippancy of any element, i.e., the number of times that the count of an element changes from 0 to above 0 or vice versa. They give an item-level (ε,δ)-differentially private algorithm whose additive error is tight with respect to that parameterization. In this work, we show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level (ε,δ)-differential privacy and item-level ε-differential privacy with regards to a different parameterization, namely the sum of all flippancies. Our second result is a bound which shows that for a large class of algorithms, including all existing differentially private algorithms for this problem, the lower bound from item-level differential privacy extends to event-level differential privacy. This partially answers an open question by Jain et al. [NeurIPS2023]
Compressed Indexing for Consecutive Occurrences
The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern P. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns P? and P? and a range [a,b], and one must find all consecutive occurrences (q?,q?) of P? and P? such that q?-q? ? [a,b]. By their results, we cannot hope for a very efficient indexing structure for such queries, even if a = 0 is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors
Continual Counting with Gradual Privacy Expiration
Differential privacy with gradual expiration models the setting where data
items arrive in a stream and at a given time the privacy loss guaranteed
for a data item seen at time is , where is a
monotonically non-decreasing function. We study the fundamental
problem where each data item consists of
a bit, and the algorithm needs to output at each time step the sum of all the
bits streamed so far. For a stream of length and privacy
expiration continual counting is possible with maximum (over all time steps)
additive error and the best known lower bound is
; closing this gap is a challenging open problem.
We show that the situation is very different for privacy with gradual
expiration by giving upper and lower bounds for a large set of expiration
functions . Specifically, our algorithm achieves an additive error of for a large set of privacy expiration functions. We also
give a lower bound that shows that if is the additive error of any
-DP algorithm for this problem, then the product of and the
privacy expiration function after steps must be
. Our algorithm matches this lower bound as its
additive error is , even when .
Our empirical evaluation shows that we achieve a slowly growing privacy loss
with significantly smaller empirical privacy loss for large values of than
a natural baseline algorithm
String Indexing for Top- Close Consecutive Occurrences
The classic string indexing problem is to preprocess a string into a
compact data structure that supports efficient subsequent pattern matching
queries, that is, given a pattern string , report all occurrences of
within . In this paper, we study a basic and natural extension of string
indexing called the string indexing for top- close consecutive occurrences
problem (SITCCO). Here, a consecutive occurrence is a pair , ,
such that occurs at positions and in and there is no occurrence
of between and , and their distance is defined as . Given a
pattern and a parameter , the goal is to report the top- consecutive
occurrences of in of minimal distance. The challenge is to compactly
represent while supporting queries in time close to length of and .
We give two time-space trade-offs for the problem. Let be the length of
, the length of , and . Our first result achieves
space and optimal query time of , and our second result
achieves linear space and query time . Along the way, we
develop several techniques of independent interest, including a new translation
of the problem into a line segment intersection problem and a new recursive
clustering technique for trees.Comment: Fixed typos, minor change
Effect of natalizumab on disease progression in secondary progressive multiple sclerosis (ASCEND). a phase 3, randomised, double-blind, placebo-controlled trial with an open-label extension
Background: Although several disease-modifying treatments are available for relapsing multiple sclerosis, treatment effects have been more modest in progressive multiple sclerosis and have been observed particularly in actively relapsing subgroups or those with lesion activity on imaging. We sought to assess whether natalizumab slows disease progression in secondary progressive multiple sclerosis, independent of relapses. Methods: ASCEND was a phase 3, randomised, double-blind, placebo-controlled trial (part 1) with an optional 2 year open-label extension (part 2). Enrolled patients aged 18–58 years were natalizumab-naive and had secondary progressive multiple sclerosis for 2 years or more, disability progression unrelated to relapses in the previous year, and Expanded Disability Status Scale (EDSS) scores of 3·0–6·5. In part 1, patients from 163 sites in 17 countries were randomly assigned (1:1) to receive 300 mg intravenous natalizumab or placebo every 4 weeks for 2 years. Patients were stratified by site and by EDSS score (3·0–5·5 vs 6·0–6·5). Patients completing part 1 could enrol in part 2, in which all patients received natalizumab every 4 weeks until the end of the study. Throughout both parts, patients and staff were masked to the treatment received in part 1. The primary outcome in part 1 was the proportion of patients with sustained disability progression, assessed by one or more of three measures: the EDSS, Timed 25-Foot Walk (T25FW), and 9-Hole Peg Test (9HPT). The primary outcome in part 2 was the incidence of adverse events and serious adverse events. Efficacy and safety analyses were done in the intention-to-treat population. This trial is registered with ClinicalTrials.gov, number NCT01416181. Findings: Between Sept 13, 2011, and July 16, 2015, 889 patients were randomly assigned (n=440 to the natalizumab group, n=449 to the placebo group). In part 1, 195 (44%) of 439 natalizumab-treated patients and 214 (48%) of 448 placebo-treated patients had confirmed disability progression (odds ratio [OR] 0·86; 95% CI 0·66–1·13; p=0·287). No treatment effect was observed on the EDSS (OR 1·06, 95% CI 0·74–1·53; nominal p=0·753) or the T25FW (0·98, 0·74–1·30; nominal p=0·914) components of the primary outcome. However, natalizumab treatment reduced 9HPT progression (OR 0·56, 95% CI 0·40–0·80; nominal p=0·001). In part 1, 100 (22%) placebo-treated and 90 (20%) natalizumab-treated patients had serious adverse events. In part 2, 291 natalizumab-continuing patients and 274 natalizumab-naive patients received natalizumab (median follow-up 160 weeks [range 108–221]). Serious adverse events occurred in 39 (13%) patients continuing natalizumab and in 24 (9%) patients initiating natalizumab. Two deaths occurred in part 1, neither of which was considered related to study treatment. No progressive multifocal leukoencephalopathy occurred. Interpretation: Natalizumab treatment for secondary progressive multiple sclerosis did not reduce progression on the primary multicomponent disability endpoint in part 1, but it did reduce progression on its upper-limb component. Longer-term trials are needed to assess whether treatment of secondary progressive multiple sclerosis might produce benefits on additional disability components. Funding: Biogen
Definition, aims, and implementation of GA2LEN/HAEi Angioedema Centers of Reference and Excellence
- …