Search CORE

234 research outputs found

Regular Languages meet Prefix Sorting

Author: Alanko Jarno
D'Agostino Giovanna
Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 09/07/2019
Field of study

Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with

n

states admits an equivalent Wheeler DFA (WDFA) with at most

2n-1-|\Sigma|

states that can be computed in

O(n^3)

time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a

O(n\log n)

-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in

O(n\log n)

time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Faster Compression of Deterministic Finite Automata

Author: Bille Philip
Gørtz Inge Li
Pedersen Max Rishøj
Publication venue
Publication date: 22/06/2023
Field of study

Deterministic finite automata (DFA) are a classic tool for high throughput matching of regular expressions, both in theory and practice. Due to their high space consumption, extensive research has been devoted to compressed representations of DFAs that still support efficient pattern matching queries. Kumar~et~al.~[SIGCOMM 2006] introduced the \emph{delayed deterministic finite automaton} (\ddfa{}) which exploits the large redundancy between inter-state transitions in the automaton. They showed it to obtain up to two orders of magnitude compression of real-world DFAs, and their work formed the basis of numerous subsequent results. Their algorithm, as well as later algorithms based on their idea, have an inherent quadratic-time bottleneck, as they consider every pair of states to compute the optimal compression. In this work we present a simple, general framework based on locality-sensitive hashing for speeding up these algorithms to achieve sub-quadratic construction times for \ddfa{}s. We apply the framework to speed up several algorithms to near-linear time, and experimentally evaluate their performance on real-world regular expression sets extracted from modern intrusion detection systems. We find an order of magnitude improvement in compression times, with either little or no loss of compression, or even significantly better compression in some cases

arXiv.org e-Print Archive

Maximal Sharing in the Lambda Calculus with letrec

Author: Asperti A.
Blom S.
Chitil O.
Danvy O.
de Medeiros Santos A. L.
Grabmayer C.
Grabmayer C.
Hopcroft J.
Hopcroft J.
Johnsson T.
Milner R.
Norton D. A.
Oostrom V. v.
Peyton Jones S. L.
Plump D.
Wadsworth C. P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Increasing sharing in programs is desirable to compactify the code, and to avoid duplication of reduction work at run-time, thereby speeding up execution. We show how a maximal degree of sharing can be obtained for programs expressed as terms in the lambda calculus with letrec. We introduce a notion of `maximal compactness' for lambda-letrec-terms among all terms with the same infinite unfolding. Instead of defined purely syntactically, this notion is based on a graph semantics. lambda-letrec-terms are interpreted as first-order term graphs so that unfolding equivalence between terms is preserved and reflected through bisimilarity of the term graph interpretations. Compactness of the term graphs can then be compared via functional bisimulation. We describe practical and efficient methods for the following two problems: transforming a lambda-letrec-term into a maximally compact form; and deciding whether two lambda-letrec-terms are unfolding-equivalent. The transformation of a lambda-letrec-term

L

into maximally compact form

L_0

proceeds in three steps: (i) translate L into its term graph

G = [[ L ]]

; (ii) compute the maximally shared form of

G

as its bisimulation collapse

G_0

; (iii) read back a lambda-letrec-term

L_0

from the term graph

G_0

with the property

[[ L_0 ]] = G_0

. This guarantees that

L_0

and

L

have the same unfolding, and that

L_0

exhibits maximal sharing. The procedure for deciding whether two given lambda-letrec-terms

L_1

and

L_2

are unfolding-equivalent computes their term graph interpretations

[[ L_1 ]]

and

[[ L_2 ]]

, and checks whether these term graphs are bisimilar. For illustration, we also provide a readily usable implementation.Comment: 18 pages, plus 19 pages appendi

arXiv.org e-Print Archive

Crossref

VU Research Portal

Faster Prefix-Sorting Algorithms for Deterministic Finite Automata

Author: Francisco Olivares
Nicola Prezza
Sung-Hwan Kim
Publication venue: Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik
Publication date: 01/01/2023
Field of study

Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in O(m² + n^{5/2}) time, m being the number of transitions and n the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in O(mn) and O(n² log n) time, respectively, and an algorithm sorting acyclic DFAs in O(m log n) time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the O(n² log n)-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Dagstuhl Research Online Publication Server

Asymptotics of Minimal Deterministic Finite Automata Recognizing a Finite Binary Language

Author: Elvey Price Andrew
Fang Wenjie
Wallner Michael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2020)
Publication date: 01/01/2020
Field of study

We show that the number of minimal deterministic finite automata with n+1 states recognizing a finite binary language grows asymptotically for n ? ? like ?(n! 8? e^{3 a? n^{1/3}} n^{7/8}), where a? ? -2.338 is the largest root of the Airy function. For this purpose, we use a new asymptotic enumeration method proposed by the same authors in a recent preprint (2019). We first derive a new two-parameter recurrence relation for the number of such automata up to a given size. Using this result, we prove by induction tight bounds that are sufficiently accurate for large n to determine the asymptotic form using adapted Netwon polygons

Dagstuhl Research Online Publication Server

Inferring Symbolic Automata

Author: Fisman Dana
Frenkel Hadar
Zilles Sandra
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th EACSL Annual Conference on Computer Science Logic (CSL 2022)
Publication date: 01/01/2022
Field of study

We study the learnability of symbolic finite state automata, a model shown useful in many applications in software verification. The state-of-the-art literature on this topic follows the query learning paradigm, and so far all obtained results are positive. We provide a necessary condition for efficient learnability of SFAs in this paradigm, from which we obtain the first negative result. The main focus of our work lies in the learnability of SFAs under the paradigm of identification in the limit using polynomial time and data. We provide a necessary condition and a sufficient condition for efficient learnability of SFAs in this paradigm, from which we derive a positive and a negative result

Dagstuhl Research Online Publication Server

Enumerating Regular Languages with Bounded Delay

Author: Amarilli Antoine
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Abstract Learning Frameworks for Synthesis

Author: A Blum
A Cheung
A Lal
A Vardhan
Armando Solar-Lezama
C Flanagan
E Kitzelmann
E Kneuss
G Higman
M Barnett
MJ Kearns
P Garg
P Garg
P Černý
PM Domingos
R Sharma
S Saha
V Kuncak
Z Manna
Publication venue
Publication date: 01/01/2016
Field of study

We develop abstract learning frameworks (ALFs) for synthesis that embody the principles of CEGIS (counter-example based inductive synthesis) strategies that have become widely applicable in recent years. Our framework defines a general abstract framework of iterative learning, based on a hypothesis space that captures the synthesized objects, a sample space that forms the space on which induction is performed, and a concept space that abstractly defines the semantics of the learning process. We show that a variety of synthesis algorithms in current literature can be embedded in this general framework. While studying these embeddings, we also generalize some of the synthesis problems these instances are of, resulting in new ways of looking at synthesis problems using learning. We also investigate convergence issues for the general framework, and exhibit three recipes for convergence in finite time. The first two recipes generalize current techniques for convergence used by existing synthesis engines. The third technique is a more involved technique of which we know of no existing instantiation, and we instantiate it to concrete synthesis problems

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Hybrid Compositional Reasoning for Reactive Synthesis from Finite-Horizon Specifications

Author: Bansal Suguman
Li Yong
Tabajara Lucas M.
Vardi Moshe Y.
Publication venue
Publication date: 17/02/2020
Field of study

LTLf synthesis is the automated construction of a reactive system from a high-level description, expressed in LTLf, of its finite-horizon behavior. So far, the conversion of LTLf formulas to deterministic finite-state automata (DFAs) has been identified as the primary bottleneck to the scalabity of synthesis. Recent investigations have also shown that the size of the DFA state space plays a critical role in synthesis as well. Therefore, effective resolution of the bottleneck for synthesis requires the conversion to be time and memory performant, and prevent state-space explosion. Current conversion approaches, however, which are based either on explicit-state representation or symbolic-state representation, fail to address these necessities adequately at scale: Explicit-state approaches generate minimal DFA but are slow due to expensive DFA minimization. Symbolic-state representations can be succinct, but due to the lack of DFA minimization they generate such large state spaces that even their symbolic representations cannot compensate for the blow-up. This work proposes a hybrid representation approach for the conversion. Our approach utilizes both explicit and symbolic representations of the state-space, and effectively leverages their complementary strengths. In doing so, we offer an LTLf to DFA conversion technique that addresses all three necessities, hence resolving the bottleneck. A comprehensive empirical evaluation on conversion and synthesis benchmarks supports the merits of our hybrid approach.Comment: Accepted by AAAI 2020. Tool Lisa for (a). LTLf to DFA conversion, and (b). LTLf synthesis can be found here: https://github.com/vardigroup/lis

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications