Search CORE

32,892 research outputs found

Distributed Spanner Approximation

Author: Censor-Hillel Keren
Dory Michal
Publication venue
Publication date: 09/02/2018
Field of study

We address the fundamental network design problem of constructing approximate minimum spanners. Our contributions are for the distributed setting, providing both algorithmic and hardness results. Our main hardness result shows that an

\alpha

-approximation for the minimum directed

k

-spanner problem for

k \geq 5

requires

\Omega(n /\sqrt{\alpha}\log{n})

rounds using deterministic algorithms or

\Omega(\sqrt{n }/\sqrt{\alpha}\log{n})

rounds using randomized ones, in the CONGEST model of distributed computing. Combined with the constant-round

O(n^{\epsilon})

-approximation algorithm in the LOCAL model of [Barenboim, Elkin and Gavoille, 2016], as well as a polylog-round

(1+\epsilon)

-approximation algorithm in the LOCAL model that we show here, our lower bounds for the CONGEST model imply a strict separation between the LOCAL and CONGEST models. Notably, to the best of our knowledge, this is the first separation between these models for a local approximation problem. Similarly, a separation between the directed and undirected cases is implied. We also prove a nearly-linear lower bound for the minimum weighted

k

-spanner problem for

k \geq 4

, and we show lower bounds for the weighted 2-spanner problem. On the algorithmic side, apart from the aforementioned

(1+\epsilon)

-approximation algorithm for minimum

k

-spanners, our main contribution is a new distributed construction of minimum 2-spanners that uses only polynomial local computations. Our algorithm has a guaranteed approximation ratio of

O(\log(m/n))

for a graph with

n

vertices and

m

edges, which matches the best known ratio for polynomial time sequential algorithms [Kortsarz and Peleg, 1994], and is tight if we restrict ourselves to polynomial local computations. Our approach allows us to extend our algorithm to work also for the directed, weighted, and client-server variants of the problem

arXiv.org e-Print Archive

Strategies for basing the CS theory course on non-decision problems

Author: MacCormick John
Publication venue
Publication date: 24/11/2017
Field of study

Computational and complexity theory are core components of the computer science curriculum, and in the vast majority of cases are taught using decision problems as the main paradigm. For experienced practitioners, decision problems are the best tool. But for undergraduates encountering the material for the first time, we present evidence that non-decision problems (such as optimization problems and search problems) are preferable. In addition, we describe technical definitions and pedagogical strategies that have been used successfully for teaching the theory course using non-decision problems as the central concept

arXiv.org e-Print Archive

Restricted Common Superstring and Restricted Common Supersequence

Author: Clifford Raphaël
Gotthilf Zvi
Lewenstein Moshe
Popa Alexandru
Publication venue
Publication date: 27/06/2010
Field of study

The {\em shortest common superstring} and the {\em shortest common supersequence} are two well studied problems having a wide range of applications. In this paper we consider both problems with resource constraints, denoted as the Restricted Common Superstring (shortly \textit{RCSstr}) problem and the Restricted Common Supersequence (shortly \textit{RCSseq}). In the \textit{RCSstr} (\textit{RCSseq}) problem we are given a set

S

n

strings,

s_1

s_2

\ldots

s_n

, and a multiset

t = \{t_1, t_2, \dots, t_m\}

, and the goal is to find a permutation

\pi : \{1, \dots, m\} \to \{1, \dots, m\}

to maximize the number of strings in

S

that are substrings (subsequences) of

\pi(t) = t_{\pi(1)}t_{\pi(2)}...t_{\pi(m)}

(we call this ordering of the multiset,

\pi(t)

, a permutation of

t

). We first show that in its most general setting the \textit{RCSstr} problem is {\em NP-complete} and hard to approximate within a factor of

n^{1-\epsilon}

, for any

\epsilon > 0

, unless P = NP. Afterwards, we present two separate reductions to show that the \textit{RCSstr} problem remains NP-Hard even in the case where the elements of

t

are drawn from a binary alphabet or for the case where all input strings are of length two. We then present some approximation results for several variants of the \textit{RCSstr} problem. In the second part of this paper, we turn to the \textit{RCSseq} problem, where we present some hardness results, tight lower bounds and approximation algorithms.Comment: Submitted to WAOA 201

arXiv.org e-Print Archive

A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique

Author: Chaudhuri Arindam
Publication venue
Publication date: 07/07/2013
Field of study

We present a dynamic algorithm for solving the Longest Common Subsequence Problem using Ant Colony Optimization Technique. The Ant Colony Optimization Technique has been applied to solve many problems in Optimization Theory, Machine Learning and Telecommunication Networks etc. In particular, application of this theory in NP-Hard Problems has a remarkable significance. Given two strings, the traditional technique for finding Longest Common Subsequence is based on Dynamic Programming which consists of creating a recurrence relation and filling a table of size . The proposed algorithm draws analogy with behavior of ant colonies function and this new computational paradigm is known as Ant System. It is a viable new approach to Stochastic Combinatorial Optimization. The main characteristics of this model are positive feedback, distributed computation, and the use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. We apply the proposed methodology to Longest Common Subsequence Problem and give the simulation results. The effectiveness of this approach is demonstrated by efficient Computational Complexity. To the best of our knowledge, this is the first Ant Colony Optimization Algorithm for Longest Common Subsequence Problem.Comment: Proceedings of 2nd International Conference on Mathematics: Trends and Developments, Al Azhar University, Cairo, Egypt, 200

arXiv.org e-Print Archive

Mining Statistically Significant Substrings using the Chi-Square Statistic

Author: Bhattacharya Arnab
Sachan Mayank
Publication venue
Publication date: 30/06/2012
Field of study

The problem of identification of statistically significant patterns in a sequence of data has been applied to many domains such as intrusion detection systems, financial models, web-click records, automated monitoring systems, computational biology, cryptology, and text analysis. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to randomness or chance alone. We use the chi-square statistic as a quantitative measure of statistical significance. Given a string of characters generated from a memoryless Bernoulli model, the problem is to identify the substring for which the empirical distribution of single letters deviates the most from the distribution expected from the generative Bernoulli model. This deviation is captured using the chi-square measure. The most significant substring (MSS) of a string is thus defined as the substring having the highest chi-square value. Till date, to the best of our knowledge, there does not exist any algorithm to find the MSS in better than O(n^2) time, where n denotes the length of the string. In this paper, we propose an algorithm to find the most significant substring, whose running time is O(n^{3/2}) with high probability. We also study some variants of this problem such as finding the top-t set, finding all substrings having chi-square greater than a fixed threshold and finding the MSS among substrings greater than a given length. We experimentally demonstrate the asymptotic behavior of the MSS on varying the string size and alphabet size. We also describe some applications of our algorithm on cryptology and real world data from finance and sports. Finally, we compare our technique with the existing heuristics for finding the MSS.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Streaming k-mismatch with error correcting and applications

Author: Radoszewski Jakub
Starikovskaya Tatiana
Publication venue
Publication date: 23/04/2019
Field of study

We present a new streaming algorithm for the

k

-Mismatch problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming distance at most

k

from the pattern. Our algorithm is enhanced with an important new feature called Error Correcting, and its complexities for

k=1

and for a general

k

are comparable to those of the solutions for the

k

-Mismatch problem by Porat and Porat (FOCS 2009) and Clifford et al. (SODA 2016). In parallel to our research, a yet more efficient algorithm for the

k

-Mismatch problem with the Error Correcting feature was developed by Clifford et al. (SODA 2019). Using the new feature and recent work on streaming Multiple Pattern Matching we develop a series of streaming algorithms for pattern matching on weighted strings, which are a commonly used representation of uncertain sequences in molecular biology. We also show that these algorithms are space-optimal up to polylog factors. A preliminary version of this work was published at DCC 2017 conference

arXiv.org e-Print Archive

Fast Packed String Matching for Short Patterns

Author: Faro Simone
Külekci M. Oguzhan
Publication venue
Publication date: 28/09/2012
Field of study

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time. In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns. From our experimental results it turns out that, despite their quadratic worst case time complexity, the new presented algorithms become the clear winners on the average for short patterns, when compared against the most effective algorithms known in literature.Comment: 15 page

arXiv.org e-Print Archive

Data Structure Lower Bounds on Random Access to Grammar-Compressed Strings

Author: Chen Shiteng
Verbin Elad
Yu Wei
Publication venue
Publication date: 03/05/2012
Field of study

In this paper we investigate the problem of building a static data structure that represents a string s using space close to its compressed size, and allows fast access to individual characters of s. This type of structures was investigated by the recent paper of Bille et al. Let n be the size of a context-free grammar that derives a unique string s of length L. (Note that L might be exponential in n.) Bille et al. showed a data structure that uses space O(n) and allows to query for the i-th character of s using running time O(log L). Their data structure works on a word RAM with a word size of logL bits. Here we prove that for such data structures, if the space is poly(n), then the query time must be at least (log L)^{1-\epsilon}/log S where S is the space used, for any constant eps>0. As a function of n, our lower bound is \Omega(n^{1/2-\epsilon}). Our proof holds in the cell-probe model with a word size of log L bits, so in particular it holds in the word RAM model. We show that no lower bound significantly better than n^{1/2-\epsilon} can be achieved in the cell-probe model, since there is a data structure in the cell-probe model that uses O(n) space and achieves O(\sqrt{n log n}) query time. The "bad" setting of parameters occurs roughly when L=2^{\sqrt{n}}. We also prove a lower bound for the case of not-as-compressible strings, where, say, L=n^{1+\epsilon}. For this case, we prove that if the space is n polylog(n), then the query time must be at least \Omega(log n/loglog n). The proof works by reduction to communication complexity, namely to the LSD problem, recently employed by Patrascu and others. We prove lower bounds also for the case of LZ-compression and Burrows-Wheeler (BWT) compression. All of our lower bounds hold even when the strings are over an alphabet of size 2 and hold even for randomized data structures with 2-sided error.Comment: submitted to ICALP 2012, with strengthened results include

arXiv.org e-Print Archive

On Verification of D-Detectability for Discrete Event Systems

Author: Balun Jiří
Masopust Tomáš
Publication venue
Publication date: 16/05/2020
Field of study

Detectability has been introduced as a generalization of state-estimation properties of discrete event systems studied in the literature. It asks whether the current and subsequent states of a system can be determined based on observations. Since, in some applications, to exactly determine the current and subsequent states may be too strict, a relaxed notion of D-detectability has been introduced, distinguishing only certain pairs of states rather than all states. Four variants of D-detectability have been defined: strong (periodic) D-detectability and weak (periodic) D-detectability. Deciding weak (periodic) D-detectability is PSpace-complete, while deciding strong (periodic) detectability or strong D-detectability is polynomial (and we show that it is actually NL-complete). However, to the best of our knowledge, it is an open problem whether there exists a polynomial-time algorithm deciding strong periodic D-detectability. We solve this problem by showing that deciding strong periodic D-detectability is a PSpace-complete problem, and hence there is no polynomial-time algorithm unless PSpace = P. We further show that there is no polynomial-time algorithm deciding strong periodic D-detectability even for systems with a single observable event, unless P = NP. Finally, we propose a class of systems for which the problem is tractable.Comment: Extended version of a paper accepted for WODES 202

arXiv.org e-Print Archive

$LCSk$ ++: Practical similarity metric for long strings

Author: Pavetić Filip
Šikić Mile
Žužić Goran
Publication venue
Publication date: 09/07/2014
Field of study

In this paper we present

LCSk

++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants and animals, classic algorithms such as Longest Common Subsequence (LCS) fail due to demanding computational complexity. Recently, Benson et al. defined a similarity metric named

LCSk

. By relaxing the requirement that the

k

-length substrings should not overlap, we extend their definition into a new metric. An efficient algorithm is presented which computes

LCSk

++ with complexity of

O((|X|+|Y|)\log(|X|+|Y|))

for strings

X

and

Y

under a realistic random model. The algorithm has been designed with implementation simplicity in mind. Additionally, we describe how it can be adjusted to compute

LCSk

as well, which gives an improvement of the

O(|X|\dot|Y|)

algorithm presented in the original

LCSk

paper

arXiv.org e-Print Archive