Search CORE

1,444 research outputs found

Incremental construction of minimal acyclic finite-state automata

Author: Daciuk Jan
Mihov Stoyan
Watson Bruce
Watson Richard
Publication venue
Publication date: 01/01/2000
Field of study

In this paper, we describe a new method for constructing minimal, deterministic, acyclic finite-state automata from a set of strings. Traditional methods consist of two phases: the first to construct a trie, the second one to minimize it. Our approach is to construct a minimal automaton in a single phase by adding new strings one by one and minimizing the resulting automaton on-the-fly. We present a general algorithm as well as a specialization that relies upon the lexicographical ordering of the input strings.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Finite Automata for the Sub- and Superword Closure of CFLs: Descriptional and Computational Complexity

Author: A Okhotin
B Courcelle
C Brabrand
H Gruber
J Esparza
J Leeuwen van
M Mohri
MF Atig
N Rampersad
N Vasudevan
P Ganty
P Habermehl
R Axelsson
S Schmitz
Y Bar-Hillel
Z Long
Publication venue
Publication date: 23/10/2014
Field of study

We answer two open questions by (Gruber, Holzer, Kutrib, 2009) on the state-complexity of representing sub- or superword closures of context-free grammars (CFGs): (1) We prove a (tight) upper bound of

2^{\mathcal{O}(n)}

on the size of nondeterministic finite automata (NFAs) representing the subword closure of a CFG of size

n

. (2) We present a family of CFGs for which the minimal deterministic finite automata representing their subword closure matches the upper-bound of

2^{2^{\mathcal{O}(n)}}

following from (1). Furthermore, we prove that the inequivalence problem for NFAs representing sub- or superword-closed languages is only NP-complete as opposed to PSPACE-complete for general NFAs. Finally, we extend our results into an approximation method to attack inequivalence problems for CFGs

arXiv.org e-Print Archive

CiteSeerX

Crossref

DAFSA: a Python library for Deterministic Acyclic Finite State Automata [Software]

Author: Tresoldi T.
Publication venue: 'The Open Journal'
Publication date: 19/02/2020
Field of study

This work describes dafsa, a Python library for computing graphs from lists of strings for identifying, visualizing, and inspecting patterns of substrings. The library is designed for usage by linguists in studies on morphology and formal grammars, and is intended for faster, easier, and simpler generation of visualizations. It collects frequency weights by default, it can condense structures, and it provides several export options. Figure 1 depicts a basic DAFSA, based upon five English words and generated with default settings

MPG.PuRe

Joining Extractions of Regular Expressions

Author: Freydenberger Dominik D.
Kimelfeld Benny
Peterfreund Liat
Publication venue
Publication date: 30/03/2017
Field of study

Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

arXiv.org e-Print Archive

Crossref

Loughborough University Institutional Repository

A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

Author: Ahmed Abdelali
Ahmed Lehireche
Denis Maurel
Noureddine Doumi
null null
Publication venue: IJIT
Publication date: 01/02/2016
Field of study

International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

Crossref

Directory of Open Access Journals

HAL Descartes

HAL Université de Tours

Hal-Diderot

Efficient Implementation for Deterministic Finite Tree Automata Minimization

Author: Hadda Cherroun
Younes Guellouma
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date
Field of study

Crossref

CAIR: Using Formal Languages to Study Routing, Leaking, and Interception in BGP

Author: Biersack Ernst W.
Carle Georg
Schlamp Johann
Schmidt Thomas C.
Wählisch Matthias
Publication venue
Publication date: 01/01/2016
Field of study

The Internet routing protocol BGP expresses topological reachability and policy-based decisions simultaneously in path vectors. A complete view on the Internet backbone routing is given by the collection of all valid routes, which is infeasible to obtain due to information hiding of BGP, the lack of omnipresent collection points, and data complexity. Commonly, graph-based data models are used to represent the Internet topology from a given set of BGP routing tables but fall short of explaining policy contexts. As a consequence, routing anomalies such as route leaks and interception attacks cannot be explained with graphs. In this paper, we use formal languages to represent the global routing system in a rigorous model. Our CAIR framework translates BGP announcements into a finite route language that allows for the incremental construction of minimal route automata. CAIR preserves route diversity, is highly efficient, and well-suited to monitor BGP path changes in real-time. We formally derive implementable search patterns for route leaks and interception attacks. In contrast to the state-of-the-art, we can detect these incidents. In practical experiments, we analyze public BGP data over the last seven years

arXiv.org e-Print Archive

REPOSIT