32,892 research outputs found
Distributed Spanner Approximation
We address the fundamental network design problem of constructing approximate
minimum spanners. Our contributions are for the distributed setting, providing
both algorithmic and hardness results.
Our main hardness result shows that an -approximation for the minimum
directed -spanner problem for requires rounds using deterministic algorithms or
rounds using randomized ones, in the
CONGEST model of distributed computing. Combined with the constant-round
-approximation algorithm in the LOCAL model of [Barenboim,
Elkin and Gavoille, 2016], as well as a polylog-round
-approximation algorithm in the LOCAL model that we show here,
our lower bounds for the CONGEST model imply a strict separation between the
LOCAL and CONGEST models. Notably, to the best of our knowledge, this is the
first separation between these models for a local approximation problem.
Similarly, a separation between the directed and undirected cases is implied.
We also prove a nearly-linear lower bound for the minimum weighted -spanner
problem for , and we show lower bounds for the weighted 2-spanner
problem.
On the algorithmic side, apart from the aforementioned
-approximation algorithm for minimum -spanners, our main
contribution is a new distributed construction of minimum 2-spanners that uses
only polynomial local computations. Our algorithm has a guaranteed
approximation ratio of for a graph with vertices and
edges, which matches the best known ratio for polynomial time sequential
algorithms [Kortsarz and Peleg, 1994], and is tight if we restrict ourselves to
polynomial local computations. Our approach allows us to extend our algorithm
to work also for the directed, weighted, and client-server variants of the
problem
Strategies for basing the CS theory course on non-decision problems
Computational and complexity theory are core components of the computer
science curriculum, and in the vast majority of cases are taught using decision
problems as the main paradigm. For experienced practitioners, decision problems
are the best tool. But for undergraduates encountering the material for the
first time, we present evidence that non-decision problems (such as
optimization problems and search problems) are preferable. In addition, we
describe technical definitions and pedagogical strategies that have been used
successfully for teaching the theory course using non-decision problems as the
central concept
Restricted Common Superstring and Restricted Common Supersequence
The {\em shortest common superstring} and the {\em shortest common
supersequence} are two well studied problems having a wide range of
applications. In this paper we consider both problems with resource
constraints, denoted as the Restricted Common Superstring (shortly
\textit{RCSstr}) problem and the Restricted Common Supersequence (shortly
\textit{RCSseq}). In the \textit{RCSstr} (\textit{RCSseq}) problem we are given
a set of strings, , , , , and a multiset , and the goal is to find a permutation to maximize the number of strings in that
are substrings (subsequences) of
(we call this ordering of the multiset, , a permutation of ). We
first show that in its most general setting the \textit{RCSstr} problem is {\em
NP-complete} and hard to approximate within a factor of , for
any , unless P = NP. Afterwards, we present two separate
reductions to show that the \textit{RCSstr} problem remains NP-Hard even in the
case where the elements of are drawn from a binary alphabet or for the case
where all input strings are of length two. We then present some approximation
results for several variants of the \textit{RCSstr} problem. In the second part
of this paper, we turn to the \textit{RCSseq} problem, where we present some
hardness results, tight lower bounds and approximation algorithms.Comment: Submitted to WAOA 201
A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique
We present a dynamic algorithm for solving the Longest Common Subsequence
Problem using Ant Colony Optimization Technique. The Ant Colony Optimization
Technique has been applied to solve many problems in Optimization Theory,
Machine Learning and Telecommunication Networks etc. In particular, application
of this theory in NP-Hard Problems has a remarkable significance. Given two
strings, the traditional technique for finding Longest Common Subsequence is
based on Dynamic Programming which consists of creating a recurrence relation
and filling a table of size . The proposed algorithm draws analogy with
behavior of ant colonies function and this new computational paradigm is known
as Ant System. It is a viable new approach to Stochastic Combinatorial
Optimization. The main characteristics of this model are positive feedback,
distributed computation, and the use of constructive greedy heuristic. Positive
feedback accounts for rapid discovery of good solutions, distributed
computation avoids premature convergence and greedy heuristic helps find
acceptable solutions in minimum number of stages. We apply the proposed
methodology to Longest Common Subsequence Problem and give the simulation
results. The effectiveness of this approach is demonstrated by efficient
Computational Complexity. To the best of our knowledge, this is the first Ant
Colony Optimization Algorithm for Longest Common Subsequence Problem.Comment: Proceedings of 2nd International Conference on Mathematics: Trends
and Developments, Al Azhar University, Cairo, Egypt, 200
Mining Statistically Significant Substrings using the Chi-Square Statistic
The problem of identification of statistically significant patterns in a
sequence of data has been applied to many domains such as intrusion detection
systems, financial models, web-click records, automated monitoring systems,
computational biology, cryptology, and text analysis. An observed pattern of
events is deemed to be statistically significant if it is unlikely to have
occurred due to randomness or chance alone. We use the chi-square statistic as
a quantitative measure of statistical significance. Given a string of
characters generated from a memoryless Bernoulli model, the problem is to
identify the substring for which the empirical distribution of single letters
deviates the most from the distribution expected from the generative Bernoulli
model. This deviation is captured using the chi-square measure. The most
significant substring (MSS) of a string is thus defined as the substring having
the highest chi-square value. Till date, to the best of our knowledge, there
does not exist any algorithm to find the MSS in better than O(n^2) time, where
n denotes the length of the string. In this paper, we propose an algorithm to
find the most significant substring, whose running time is O(n^{3/2}) with high
probability. We also study some variants of this problem such as finding the
top-t set, finding all substrings having chi-square greater than a fixed
threshold and finding the MSS among substrings greater than a given length. We
experimentally demonstrate the asymptotic behavior of the MSS on varying the
string size and alphabet size. We also describe some applications of our
algorithm on cryptology and real world data from finance and sports. Finally,
we compare our technique with the existing heuristics for finding the MSS.Comment: VLDB201
Streaming k-mismatch with error correcting and applications
We present a new streaming algorithm for the -Mismatch problem, one of the
most basic problems in pattern matching. Given a pattern and a text, the task
is to find all substrings of the text that are at the Hamming distance at most
from the pattern. Our algorithm is enhanced with an important new feature
called Error Correcting, and its complexities for and for a general
are comparable to those of the solutions for the -Mismatch problem by Porat
and Porat (FOCS 2009) and Clifford et al. (SODA 2016). In parallel to our
research, a yet more efficient algorithm for the -Mismatch problem with the
Error Correcting feature was developed by Clifford et al. (SODA 2019). Using
the new feature and recent work on streaming Multiple Pattern Matching we
develop a series of streaming algorithms for pattern matching on weighted
strings, which are a commonly used representation of uncertain sequences in
molecular biology. We also show that these algorithms are space-optimal up to
polylog factors.
A preliminary version of this work was published at DCC 2017 conference
Fast Packed String Matching for Short Patterns
Searching for all occurrences of a pattern in a text is a fundamental problem
in computer science with applications in many other fields, like natural
language processing, information retrieval and computational biology. In the
last two decades a general trend has appeared trying to exploit the power of
the word RAM model to speed-up the performances of classical string matching
algorithms. In this model an algorithm operates on words of length w, grouping
blocks of characters, and arithmetic and logic operations on the words take one
unit of time. In this paper we use specialized word-size packed string matching
instructions, based on the Intel streaming SIMD extensions (SSE) technology, to
design very fast string matching algorithms in the case of short patterns. From
our experimental results it turns out that, despite their quadratic worst case
time complexity, the new presented algorithms become the clear winners on the
average for short patterns, when compared against the most effective algorithms
known in literature.Comment: 15 page
Data Structure Lower Bounds on Random Access to Grammar-Compressed Strings
In this paper we investigate the problem of building a static data structure
that represents a string s using space close to its compressed size, and allows
fast access to individual characters of s. This type of structures was
investigated by the recent paper of Bille et al. Let n be the size of a
context-free grammar that derives a unique string s of length L. (Note that L
might be exponential in n.) Bille et al. showed a data structure that uses
space O(n) and allows to query for the i-th character of s using running time
O(log L). Their data structure works on a word RAM with a word size of logL
bits. Here we prove that for such data structures, if the space is poly(n),
then the query time must be at least (log L)^{1-\epsilon}/log S where S is the
space used, for any constant eps>0. As a function of n, our lower bound is
\Omega(n^{1/2-\epsilon}). Our proof holds in the cell-probe model with a word
size of log L bits, so in particular it holds in the word RAM model. We show
that no lower bound significantly better than n^{1/2-\epsilon} can be achieved
in the cell-probe model, since there is a data structure in the cell-probe
model that uses O(n) space and achieves O(\sqrt{n log n}) query time. The "bad"
setting of parameters occurs roughly when L=2^{\sqrt{n}}. We also prove a lower
bound for the case of not-as-compressible strings, where, say,
L=n^{1+\epsilon}. For this case, we prove that if the space is n polylog(n),
then the query time must be at least \Omega(log n/loglog n).
The proof works by reduction to communication complexity, namely to the LSD
problem, recently employed by Patrascu and others. We prove lower bounds also
for the case of LZ-compression and Burrows-Wheeler (BWT) compression. All of
our lower bounds hold even when the strings are over an alphabet of size 2 and
hold even for randomized data structures with 2-sided error.Comment: submitted to ICALP 2012, with strengthened results include
On Verification of D-Detectability for Discrete Event Systems
Detectability has been introduced as a generalization of state-estimation
properties of discrete event systems studied in the literature. It asks whether
the current and subsequent states of a system can be determined based on
observations. Since, in some applications, to exactly determine the current and
subsequent states may be too strict, a relaxed notion of D-detectability has
been introduced, distinguishing only certain pairs of states rather than all
states. Four variants of D-detectability have been defined: strong (periodic)
D-detectability and weak (periodic) D-detectability. Deciding weak (periodic)
D-detectability is PSpace-complete, while deciding strong (periodic)
detectability or strong D-detectability is polynomial (and we show that it is
actually NL-complete). However, to the best of our knowledge, it is an open
problem whether there exists a polynomial-time algorithm deciding strong
periodic D-detectability. We solve this problem by showing that deciding strong
periodic D-detectability is a PSpace-complete problem, and hence there is no
polynomial-time algorithm unless PSpace = P. We further show that there is no
polynomial-time algorithm deciding strong periodic D-detectability even for
systems with a single observable event, unless P = NP. Finally, we propose a
class of systems for which the problem is tractable.Comment: Extended version of a paper accepted for WODES 202
++: Practical similarity metric for long strings
In this paper we present ++: a new metric for measuring the similarity
of long strings, and provide an algorithm for its efficient computation. With
ever increasing size of strings occuring in practice, e.g. large genomes of
plants and animals, classic algorithms such as Longest Common Subsequence (LCS)
fail due to demanding computational complexity. Recently, Benson et al. defined
a similarity metric named . By relaxing the requirement that the
-length substrings should not overlap, we extend their definition into a new
metric. An efficient algorithm is presented which computes ++ with
complexity of for strings and under a
realistic random model. The algorithm has been designed with implementation
simplicity in mind. Additionally, we describe how it can be adjusted to compute
as well, which gives an improvement of the algorithm
presented in the original paper
- …