490 research outputs found
Shortest Distances as Enumeration Problem
We investigate the single source shortest distance (SSSD) and all pairs
shortest distance (APSD) problems as enumeration problems (on unweighted and
integer weighted graphs), meaning that the elements -- where
and are vertices with shortest distance -- are produced and
listed one by one without repetition. The performance is measured in the RAM
model of computation with respect to preprocessing time and delay, i.e., the
maximum time that elapses between two consecutive outputs. This point of view
reveals that specific types of output (e.g., excluding the non-reachable pairs
, or excluding the self-distances ) and the order of
enumeration (e.g., sorted by distance, sorted row-wise with respect to the
distance matrix) have a huge impact on the complexity of APSD while they appear
to have no effect on SSSD.
In particular, we show for APSD that enumeration without output restrictions
is possible with delay in the order of the average degree. Excluding
non-reachable pairs, or requesting the output to be sorted by distance,
increases this delay to the order of the maximum degree. Further, for weighted
graphs, a delay in the order of the average degree is also not possible without
preprocessing or considering self-distances as output. In contrast, for SSSD we
find that a delay in the order of the maximum degree without preprocessing is
attainable and unavoidable for any of these requirements.Comment: Updated version adds the study of space complexit
Inside the class of REGEX Languages
We study different possibilities of combining the concept of homomorphic replacement with regular expressions in order to investigate the class of languages given by extended regular expressions with backreferences (REGEX). It is shown in which regard existing and natural ways to do this fail to reach the expressive power of REGEX. Furthermore, the complexity of the membership problem for REGEX with a bounded number of backreferences is considered
On the membership problem for pattern languages and related topics
In this thesis, we investigate the complexity of the membership problem for pattern languages. A pattern is a string over the union of the alphabets A and X, where X := {x_1, x_2, x_3, ...} is a countable set of variables and A is a finite alphabet containing terminals (e.g., A := {a, b, c, d}). Every pattern, e.g., p := x_1 x_2 a b x_2 b x_1 c x_2, describes a pattern language, i.e., the set of all words that can be obtained by uniformly substituting the variables in the pattern by arbitrary strings over A. Hence, u := cacaaabaabcaccaa is a word of the pattern language of p, since substituting cac for x_1 and aa for x_2 yields u. On the other hand, there is no way to obtain the word u' := bbbababbacaaba by substituting the occurrences of x_1 and x_2 in p by words over A.
The problem to decide for a given pattern q and a given word w whether or not w is in the pattern language of q is called the membership problem for pattern languages. Consequently, (p, u) is a positive instance and (p, u') is a negative instance of the membership problem for pattern languages. For the unrestricted case, i.e., for arbitrary patterns and words, the membership problem is NP-complete. In this thesis, we identify classes of patterns for which the membership problem can be solved efficiently.
Our first main result in this regard is that the variable distance, i.e., the maximum number of different variables that separate two consecutive occurrences of the same variable, substantially contributes to the complexity of the membership problem for pattern languages. More precisely, for every class of patterns with a bounded variable distance the membership problem can be solved efficiently. The second main result is that the same holds for every class of patterns with a bounded scope coincidence degree, where the scope coincidence degree is the maximum number of intervals that cover a common position in the pattern, where each interval is given by the leftmost and rightmost occurrence of a variable in the pattern.
The proof of our first main result is based on automata theory. More precisely, we introduce a new automata model that is used as an algorithmic framework in order to show that the membership problem for pattern languages can be solved in time that is exponential only in the variable distance of the corresponding pattern. We then take a closer look at this automata model and subject it to a sound theoretical analysis. The second main result is obtained in a completely different way. We encode patterns and words as relational structures and we then reduce the membership problem for pattern languages to the homomorphism problem of relational structures, which allows us to exploit the concept of the treewidth. This approach turns out be successful, and we show that it has potential to identify further classes of patterns with a polynomial time membership problem.
Furthermore, we take a closer look at two aspects of pattern languages that are indirectly related to the membership problem. Firstly, we investigate the phenomenon that patterns can describe regular or context-free languages in an unexpected way, which implies that their membership problem can be solved efficiently. In this regard, we present several sufficient conditions and necessary conditions for the regularity and context-freeness of pattern languages. Secondly, we compare pattern languages with languages given by so-called extended regular expressions with backreferences (REGEX). The membership problem for REGEX languages is very important in practice and since REGEX are similar to pattern languages, it might be possible to improve algorithms for the membership problem for REGEX languages by investigating their relationship to patterns. In this regard, we investigate how patterns can be extended in order to describe large classes of REGEX languages
A Purely Regular Approach to Non-Regular Core Spanners
The regular spanners (characterised by vset-automata) are closed under the
algebraic operations of union, join and projection, and have desirable
algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld,
Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core
functionality of the query language AQL used in IBM's SystemT) additionally
need string equality selections and it has been shown by Freydenberger and
Holldack (ICDT 2016, Theory of Computing Systems 2018) that this leads to high
complexity and even undecidability of the typical problems in static analysis
and query evaluation. We propose an alternative approach to core spanners: by
incorporating the string-equality selections directly into the regular language
that represents the underlying regular spanner (instead of treating it as an
algebraic operation on the table extracted by the regular spanner), we obtain a
fragment of core spanners that, while having slightly weaker expressive power
than the full class of core spanners, arguably still covers the intuitive
applications of string equality selections for information extraction and has
much better upper complexity bounds of the typical problems in static analysis
and query evaluation
Fine-Grained Complexity of Regular Path Queries
A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ evaluation (called PG-approach), i. e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs
Consensus Strings with Small Maximum Distance and Small Distance Sum
The parameterised complexity of consensus string problems (Closest String, Closest Substring, Closest String with Outliers) is investigated in a more general setting, i. e., with a bound on the maximum Hamming distance and a bound on the sum of Hamming distances between solution and input strings. We completely settle the parameterised complexity of these generalised variants of Closest String and Closest Substring, and partly for Closest String with Outliers; in addition, we answer some open questions from the literature regarding the classical problem variants with only one distance bound. Finally, we investigate the question of polynomial kernels and respective lower bounds
Deterministic Regular Expressions with Back-References
Most modern libraries for regular expression matching allow back-references (i.e. repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction
- …