Search CORE

72 research outputs found

Joining Extractions of Regular Expressions

Author: Freydenberger Dominik D.
Kimelfeld Benny
Peterfreund Liat
Publication venue
Publication date: 30/03/2017
Field of study

Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

arXiv.org e-Print Archive

Crossref

Loughborough University Institutional Repository

A Logic for Document Spanners

Author: Freydenberger Dominik D.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Conference on Database Theory (ICDT 2017)
Publication date: 01/01/2017
Field of study

Document spanners are a formal framework for information extraction that was introduced by [Fagin, Kimelfeld, Reiss, and Vansummeren, J.ACM, 2015]. One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by [Freydenberger and Holldack, ICDT, 2016], there is a connection between core spanners and EC^{reg}, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of EC^{reg} that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between this fragment and core spanners. This even holds for variants of core spanners that are based on automata instead of regular expressions. Applications of this approach include an alternative way of defining relations for spanners, insights into the relative succinctness of various classes of spanner representations, and a pumping lemma for core spanners

Loughborough University Institutional Repository

Dagstuhl Research Online Publication Server

A logic for document spanners

Author: Dominik Freydenberger (3718891)
Publication venue
Publication date: 11/09/2018
Field of study

Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which formalize the query language AQL that is used in IBM’s SystemT. As shown by Freydenberger and Holldack (ICDT 2016, ToCS 2018), there is a connection between core spanners and ECreg, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of ECreg that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between SpLog and core spanners. Consequences and applications include an alternative way of defining relations for spanners, a pumping lemma for core spanners, and insights into the relative succinctness of various classes of spanner representations and their connection to graph querying languages. We also briefly discuss the connection between SpLog with negation and core spanners with a difference operator

Loughborough University Institutional Repository

A logic for document spanners

Author: Dominik Freydenberger (3718891)
Publication venue
Publication date: 01/01/2017
Field of study

Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by Freydenberger and Holldack (ICDT 2016), there is a connection between core spanners and ECreg, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of ECreg that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between this fragment and core spanners. This even holds for variants of core spanners that are based on automata instead of regular expressions. Applications of this approach include an alternative way of defining relations for spanners, insights into the relative succinctness of various classes of spanner representations, and a pumping lemma for core spanners

Loughborough University Institutional Repository

Extended Regular Expressions: Succinctness and Decidability

Author: Freydenberger Dominik D.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)
Publication date: 01/01/2011
Field of study

Most modern implementations of regular expression engines allow the use of variables (also called back references). The resulting extended regular expressions (which, in the literature, are also called practical regular expressions, rewbr, or regex) are able to express non-regular languages. The present paper demonstrates that extended regular-expressions cannot be minimized effectively (neither with respect to length, nor number of variables), and that the tradeoff in size between extended and ``classical\u27\u27 regular expressions is not bounded by any recursive function. In addition to this, we prove the undecidability of several decision problems (universality, equivalence, inclusion, regularity, and cofiniteness) for extended regular expressions. Furthermore, we show that all these results hold even if the extended regular expressions contain only a single variable

Loughborough University Institutional Repository

Dagstuhl Research Online Publication Server

Unambiguous 1-Uniform Morphisms

Author: A. Mateescu
A. Thue
C. Choffrut
D. Reidenbach
D. Reidenbach
D. Reidenbach
D.D. Freydenberger
D.D. Freydenberger
D.D. Freydenberger
Daniel Reidenbach
F. Levé
Hossein Nevisi
J.C. Schneider
Petr Ambrož
S. Holub
T. Harju
Zuzana Masáková
Štěpán Holub
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2011
Field of study

A morphism h is unambiguous with respect to a word w if there is no other morphism g that maps w to the same image as h. In the present paper we study the question of whether, for any given word, there exists an unambiguous 1-uniform morphism, i.e., a morphism that maps every letter in the word to an image of length 1.Comment: In Proceedings WORDS 2011, arXiv:1108.341

arXiv.org e-Print Archive

CiteSeerX

Crossref

Loughborough University Institutional Repository

Directory of Open Access Journals

Extended regular expressions: succinctness and decidability

Author: Dominik Freydenberger (3718891)
Publication venue
Publication date: 01/01/2013
Field of study

Most modern implementations of regular expression engines allow the use of variables (also called backreferences). The resulting extended regular expressions (which, in the literature, are also called practical regular expressions, rewbr, or regex) are able to express non-regular languages. The present paper demonstrates that extended regular-expressions cannot be minimized effectively (neither with respect to length, nor number of variables), and that the tradeoff in size between extended and "classical" regular expressions is not bounded by any recursive function. In addition to this, we prove the undecidability of several decision problems (universality, regularity, and cofiniteness) for extended regular expressions. Furthermore, we show that all these results hold even if the extended regular expressions contain only a single variable. © 2012 Springer Science+Business Media, LLC

Loughborough University Institutional Repository

Deterministic Regular Expressions with Back-References

Author: Freydenberger Dominik D.
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017)
Publication date: 01/01/2017
Field of study

Most modern libraries for regular expression matching allow back-references (i.e. repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction

arXiv.org e-Print Archive

Loughborough University Institutional Repository

Dagstuhl Research Online Publication Server

The unambiguity of segmented morphisms

Author: Daniel Reidenbach (1256598)
Dominik Freydenberger (3718891)
Publication venue
Publication date: 01/01/2007
Field of study

The unambiguity of segmented morphism

Loughborough University Institutional Repository