Search CORE

9 research outputs found

On the Expressive Power of Regular Expressions with Backreferences

Author: Nogami Taisei
Terauchi Tachio
Publication venue
Publication date: 08/08/2023
Field of study

A rewb is a regular expression extended with a feature called backreference. It is broadly known that backreference is a practical extension of regular expressions, and is supported by most modern regular expression engines, such as those in the standard libraries of Java, Python, and more. Meanwhile, indexed languages are the languages generated by indexed grammars, a formal grammar class proposed by A.V.Aho. We show that these two models' expressive powers are related in the following way: every language described by a rewb is an indexed language. As the smallest formal grammar class previously known to contain rewbs is the class of context sensitive languages, our result strictly improves the known upper-bound. Moreover, we prove the following two claims: there exists a rewb whose language does not belong to the class of stack languages, which is a proper subclass of indexed languages, and the language described by a rewb without a captured reference is in the class of nonerasing stack languages, which is a proper subclass of stack languages. Finally, we show that the hierarchy investigated in a prior study, which separates the expressive power of rewbs by the notion of nested levels, is within the class of nonerasing stack languages.Comment: 20 pages, the full version of the paper to appear in MFCS 202

arXiv.org e-Print Archive

Characterising REGEX Languages by Regular Languages Equipped with Factor-Referencing

Author: B. Carle
C. Câmpeanu
C. Câmpeanu
C. Câmpeanu
D.D. Freydenberger
G.D. Penna
H. Bordihn
J. Albert
J. Dassow
J.E.F. Friedl
M.L. Schmid
R.W. Floyd
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Detecting One-variable Patterns

Author: A Amir
A Ehrenfeucht
D Angluin
D Kosolobov
D Kosolobov
E Czeizler
F Manea
G Manacher
J Kärkkäinen
JEF Friedl
M Crochemore
M Crochemore
M Lothaire
M Rubinchik
ML Schmid
P Gawrychowski
Z Galil
Z Xu
Publication venue
Publication date: 01/01/2017
Field of study

Given a pattern

p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r

such that

x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}

, where

x

is a variable and

\overset{{}_{\leftarrow}}{x}

its reversal, and

s_1,s_2,\ldots,s_r

are strings that contain no variables, we describe an algorithm that constructs in

O(rn)

time a compact representation of all

P

instances of

p

in an input string of length

n

over a polynomially bounded integer alphabet, so that one can report those instances in

O(P)

time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Joining Extractions of Regular Expressions

Author: Freydenberger Dominik D.
Kimelfeld Benny
Peterfreund Liat
Publication venue
Publication date: 30/03/2017
Field of study

Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

arXiv.org e-Print Archive

Crossref

Loughborough University Institutional Repository

Joining extractions of regular expressions

Author: Benny Kimelfeld (7169462)
Dominik Freydenberger (3718891)
Liat Peterfreund (7169465)
Publication venue
Publication date: 01/01/2018
Field of study

Regular expressions with capture variables, also known as “regex formulas,” extract relations of spans (interval positions) from text. These relations can be further manipulated via the relational Algebra as studied in the context of “document spanners,” Fagin et al.’s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. Such queries have been investigated in prior work on document spanners, but little is known about the (combined) complexity of their evaluation. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text. Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

Loughborough University Institutional Repository

26. Theorietag Automaten und Formale Sprachen 23. Jahrestagung Logik in der Informatik: Tagungsband

Author
Publication venue
Publication date: 01/01/2016
Field of study

Der Theorietag ist die Jahrestagung der Fachgruppe Automaten und Formale Sprachen der Gesellschaft für Informatik und fand erstmals 1991 in Magdeburg statt. Seit dem Jahr 1996 wird der Theorietag von einem eintägigen Workshop mit eingeladenen Vorträgen begleitet. Die Jahrestagung der Fachgruppe Logik in der Informatik der Gesellschaft für Informatik fand erstmals 1993 in Leipzig statt. Im Laufe beider Jahrestagungen finden auch die jährliche Fachgruppensitzungen statt. In diesem Jahr wird der Theorietag der Fachgruppe Automaten und Formale Sprachen erstmalig zusammen mit der Jahrestagung der Fachgruppe Logik in der Informatik abgehalten. Organisiert wurde die gemeinsame Veranstaltung von der Arbeitsgruppe Zuverlässige Systeme des Instituts für Informatik an der Christian-Albrechts-Universität Kiel vom 4. bis 7. Oktober im Tagungshotel Tannenfelde bei Neumünster. Während des Tre↵ens wird ein Workshop für alle Interessierten statt finden. In Tannenfelde werden • Christoph Löding (Aachen) • Tomás Masopust (Dresden) • Henning Schnoor (Kiel) • Nicole Schweikardt (Berlin) • Georg Zetzsche (Paris) eingeladene Vorträge zu ihrer aktuellen Arbeit halten. Darüber hinaus werden 26 Vorträge von Teilnehmern und Teilnehmerinnen gehalten, 17 auf dem Theorietag Automaten und formale Sprachen und neun auf der Jahrestagung Logik in der Informatik. Der vorliegende Band enthält Kurzfassungen aller Beiträge. Wir danken der Gesellschaft für Informatik, der Christian-Albrechts-Universität zu Kiel und dem Tagungshotel Tannenfelde für die Unterstützung dieses Theorietags. Ein besonderer Dank geht an das Organisationsteam: Maike Bradler, Philipp Sieweck, Joel Day. Kiel, Oktober 2016 Florin Manea, Dirk Nowotka und Thomas Wilk

MACAU: Open Access Repository of Kiel University

A logic for document spanners

Author: Dominik Freydenberger (3718891)
Publication venue
Publication date: 11/09/2018
Field of study

Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which formalize the query language AQL that is used in IBM’s SystemT. As shown by Freydenberger and Holldack (ICDT 2016, ToCS 2018), there is a connection between core spanners and ECreg, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of ECreg that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between SpLog and core spanners. Consequences and applications include an alternative way of defining relations for spanners, a pumping lemma for core spanners, and insights into the relative succinctness of various classes of spanner representations and their connection to graph querying languages. We also briefly discuss the connection between SpLog with negation and core spanners with a difference operator

Loughborough University Institutional Repository