Search CORE

490 research outputs found

Shortest Distances as Enumeration Problem

Author: Casel Katrin
Friedrich Tobias
Neubert Stefan
Schmid Markus L.
Publication venue
Publication date: 17/02/2021
Field of study

We investigate the single source shortest distance (SSSD) and all pairs shortest distance (APSD) problems as enumeration problems (on unweighted and integer weighted graphs), meaning that the elements

(u, v, d(u, v))

-- where

u

and

v

are vertices with shortest distance

d(u, v)

-- are produced and listed one by one without repetition. The performance is measured in the RAM model of computation with respect to preprocessing time and delay, i.e., the maximum time that elapses between two consecutive outputs. This point of view reveals that specific types of output (e.g., excluding the non-reachable pairs

(u, v, \infty)

, or excluding the self-distances

(u, u, 0)

) and the order of enumeration (e.g., sorted by distance, sorted row-wise with respect to the distance matrix) have a huge impact on the complexity of APSD while they appear to have no effect on SSSD. In particular, we show for APSD that enumeration without output restrictions is possible with delay in the order of the average degree. Excluding non-reachable pairs, or requesting the output to be sorted by distance, increases this delay to the order of the maximum degree. Further, for weighted graphs, a delay in the order of the average degree is also not possible without preprocessing or considering self-distances as output. In contrast, for SSSD we find that a delay in the order of the maximum degree without preprocessing is attainable and unavoidable for any of these requirements.Comment: Updated version adds the study of space complexit

arXiv.org e-Print Archive

Inside the class of REGEX Languages

Author: Markus L. Schmid (7167803)
Publication venue
Publication date: 01/01/2012
Field of study

We study different possibilities of combining the concept of homomorphic replacement with regular expressions in order to investigate the class of languages given by extended regular expressions with backreferences (REGEX). It is shown in which regard existing and natural ways to do this fail to reach the expressive power of REGEX. Furthermore, the complexity of the membership problem for REGEX with a bounded number of backreferences is considered

Loughborough University Institutional Repository

On the membership problem for pattern languages and related topics

Author: Markus L. Schmid (7167803)
Publication venue
Publication date: 01/01/2012
Field of study

In this thesis, we investigate the complexity of the membership problem for pattern languages. A pattern is a string over the union of the alphabets A and X, where X := {x_1, x_2, x_3, ...} is a countable set of variables and A is a finite alphabet containing terminals (e.g., A := {a, b, c, d}). Every pattern, e.g., p := x_1 x_2 a b x_2 b x_1 c x_2, describes a pattern language, i.e., the set of all words that can be obtained by uniformly substituting the variables in the pattern by arbitrary strings over A. Hence, u := cacaaabaabcaccaa is a word of the pattern language of p, since substituting cac for x_1 and aa for x_2 yields u. On the other hand, there is no way to obtain the word u' := bbbababbacaaba by substituting the occurrences of x_1 and x_2 in p by words over A. The problem to decide for a given pattern q and a given word w whether or not w is in the pattern language of q is called the membership problem for pattern languages. Consequently, (p, u) is a positive instance and (p, u') is a negative instance of the membership problem for pattern languages. For the unrestricted case, i.e., for arbitrary patterns and words, the membership problem is NP-complete. In this thesis, we identify classes of patterns for which the membership problem can be solved efficiently. Our first main result in this regard is that the variable distance, i.e., the maximum number of different variables that separate two consecutive occurrences of the same variable, substantially contributes to the complexity of the membership problem for pattern languages. More precisely, for every class of patterns with a bounded variable distance the membership problem can be solved efficiently. The second main result is that the same holds for every class of patterns with a bounded scope coincidence degree, where the scope coincidence degree is the maximum number of intervals that cover a common position in the pattern, where each interval is given by the leftmost and rightmost occurrence of a variable in the pattern. The proof of our first main result is based on automata theory. More precisely, we introduce a new automata model that is used as an algorithmic framework in order to show that the membership problem for pattern languages can be solved in time that is exponential only in the variable distance of the corresponding pattern. We then take a closer look at this automata model and subject it to a sound theoretical analysis. The second main result is obtained in a completely different way. We encode patterns and words as relational structures and we then reduce the membership problem for pattern languages to the homomorphism problem of relational structures, which allows us to exploit the concept of the treewidth. This approach turns out be successful, and we show that it has potential to identify further classes of patterns with a polynomial time membership problem. Furthermore, we take a closer look at two aspects of pattern languages that are indirectly related to the membership problem. Firstly, we investigate the phenomenon that patterns can describe regular or context-free languages in an unexpected way, which implies that their membership problem can be solved efficiently. In this regard, we present several sufficient conditions and necessary conditions for the regularity and context-freeness of pattern languages. Secondly, we compare pattern languages with languages given by so-called extended regular expressions with backreferences (REGEX). The membership problem for REGEX languages is very important in practice and since REGEX are similar to pattern languages, it might be possible to improve algorithms for the membership problem for REGEX languages by investigating their relationship to patterns. In this regard, we investigate how patterns can be extended in order to describe large classes of REGEX languages

Loughborough University Institutional Repository

A Purely Regular Approach to Non-Regular Core Spanners

Author: Schmid Markus L.
Schweikardt Nicole
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core functionality of the query language AQL used in IBM's SystemT) additionally need string equality selections and it has been shown by Freydenberger and Holldack (ICDT 2016, Theory of Computing Systems 2018) that this leads to high complexity and even undecidability of the typical problems in static analysis and query evaluation. We propose an alternative approach to core spanners: by incorporating the string-equality selections directly into the regular language that represents the underlying regular spanner (instead of treating it as an algebraic operation on the table extracted by the regular spanner), we obtain a fragment of core spanners that, while having slightly weaker expressive power than the full class of core spanners, arguably still covers the intuitive applications of string equality selections for information extraction and has much better upper complexity bounds of the typical problems in static analysis and query evaluation

arXiv.org e-Print Archive

DROPS Dagstuhl Research Online Publication Server

Fine-Grained Complexity of Regular Path Queries

Author: Casel Katrin
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ evaluation (called PG-approach), i. e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs

arXiv.org e-Print Archive

Episciences.org

Directory of Open Access Journals

DROPS Dagstuhl Research Online Publication Server

Consensus Strings with Small Maximum Distance and Small Distance Sum

Author: Bulteau Laurent
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)
Publication date: 01/01/2018
Field of study

The parameterised complexity of consensus string problems (Closest String, Closest Substring, Closest String with Outliers) is investigated in a more general setting, i. e., with a bound on the maximum Hamming distance and a bound on the sum of Hamming distances between solution and input strings. We completely settle the parameterised complexity of these generalised variants of Closest String and Closest Substring, and partly for Closest String with Outliers; in addition, we answer some open questions from the literature regarding the classical problem variants with only one distance bound. Finally, we investigate the question of polynomial kernels and respective lower bounds

HAL Descartes

DROPS Dagstuhl Research Online Publication Server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Deterministic Regular Expressions with Back-References

Author: Freydenberger Dominik D.
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017)
Publication date: 01/01/2017
Field of study

Most modern libraries for regular expression matching allow back-references (i.e. repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction

arXiv.org e-Print Archive

Loughborough University Institutional Repository

DROPS Dagstuhl Research Online Publication Server