10 research outputs found
Weak factor automata : the failure of failure factor oracles?
In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a
sequence. One e cient, compact representation is the factor oracle (FO). At the same time, any classical deterministic
nite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace
multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly
construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The
algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number
of transitions for many DNA sequences of lengths 4 - 512, showing gains of up to 10% in total number of transitions, with
failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as
well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching
algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically
compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural
language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using
an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.http://www.journals.co.za/ej/ejour_comp.htmlam201
Failure deterministic finite automata
Inspired by failure functions found in classical pattern matching algorithms, a failure deterministic finite automaton (FDFA) is defined as a formalism to recognise a regular language. An algorithm, based on formal concept analysis, is proposed for deriving from a given deterministic finite automaton (DFA) a language-equivalent FDFA. The FDFA’s transition diagram has fewer arcs than that of the DFA. A small modification to the classical DFA’s algorithm for recognising language elements yields a corresponding algorithm for an FDFA
Failure deterministic finite automata
Inspired by failure functions found in classical pattern matching algorithms, a failure deterministic finite automaton (FDFA) is defined as a formalism to recognise a regular language. An algorithm, based on formal concept analysis, is proposed for deriving from a given deterministic finite automaton (DFA) a language-equivalent FDFA. The FDFA’s transition diagram has fewer arcs than that of the DFA. A small modification to the classical DFA’s algorithm for recognising language elements yields a corresponding algorithm for an FDFA
Failure Deterministic Finite Automata
Lettere En WysbegeerteSentrum vir Kennisdinamika & BesluitnemingPlease help us populate SUNScholar with the post print version of this article. It can be e-mailed to: [email protected]
An assessment of algorithms for deriving failure deterministic finite automata
CITATION: Nxumalo, M., et al. 2017. An assessment of algorithms for deriving failure deterministic finite automata. South African Computer Journal, 29(1):43-68, doi:10.18489/sacj.v29i1.456.The original publication is available at http://sacj.cs.uct.ac.zaFailure deterministic finite automata (FDFAs) represent regular languages more compactly than deterministic finite automata (DFAs). Four algorithms that convert arbitrary DFAs to language-equivalent FDFAs are empirically investigated. Three are concrete variants of a previously published abstract algorithm, the DFA-Homomorphic Algorithm (DHA). The fourth builds a maximal spanning tree from the DFA to derive what it calls a delayed input DFA. A first suite of test data consists of DFAs that recognise randomised sets of finite length keywords. Since the classical Aho-Corasick algorithm builds an optimal FDFA from such a set (and only from such a set), it provides benchmark FDFAs against which the performance of the general algorithms can be compared. A second suite of test data consists of random DFAs generated by a specially designed algorithm that also builds language-equivalent FDFAs, some of which may have non-divergent cycles. These random FDFAs provide (not necessarily tight) lower bounds for assessing the effectiveness of the four general FDFA generating algorithms.http://sacj.cs.uct.ac.za/index.php/sacj/article/view/456Publisher's versio
An assessment of algorithms for deriving failure deterministic finite automata
\u3cp\u3eFailure deterministic finite automata (FDFAs) represent regular languages more compactly than deterministic finite automata (DFAs). Four algorithms that convert arbitrary DFAs to language-equivalent FDFAs are empirically investigated. Three are concrete variants of a previously published abstract algorithm, the DFA-Homomorphic Algorithm (DHA). The fourth builds a maximal spanning tree from the DFA to derive what it calls a delayed input DFA. A first suite of test data consists of DFAs that recognise randomised sets of finite length keywords. Since the classical Aho-Corasick algorithm builds an optimal FDFA from such a set (and only from such a set), it provides benchmark FDFAs against which the performance of the general algorithms can be compared. A second suite of test data consists of random DFAs generated by a specially designed algorithm that also builds language-equivalent FDFAs, some of which may have non-divergent cycles. These random FDFAs provide (not necessarily tight) lower bounds for assessing the effectiveness of the four general FDFA generating algorithms.\u3c/p\u3
An assessment of selected algorithms for generating failure deterministic finite automata
A Failure Deterministic Finite Automaton (FDFA) o ers a deterministic
and a compact representation of an automaton that is used by various algorithms
to solve pattern matching problems e ciently. An abstract, concept
lattice based algorithm called the DFA - Homomorphic Algorithm (DHA) was
proposed to convert a deterministic nite automata (DFA) into an FDFA.
The abstract DHA has several nondeterministic choices. The DHA is tuned
into four decisive and specialized variants that may potentially remove the
optimal possible number of symbol transitions from the DFA while adding
failure transitions. The resulting specialized FDFA are: MaxIntent FDFA,
MinExtent FDFA, MaxIntent-MaxExtent FDFA and MaxArcReduncdancy
FDFA. Furthermore, two output based investigations are conducted whereby
two speci c types of DFA-to-FDFA algorithms are compared with DHA variants.
Firstly, the well-known Aho-Corasick algorithm, and its DFA is converted
into DHA FDFA variants. Empirical and comparative results show
that when heuristics for DHA variants are suitably chosen, the minimality
attained by the Aho-Corasick algorithm in its output FDFAs can be closely
approximated by DHA FDFAs. Secondly, testing DHA FDFAs in the general
case whereby random DFAs and language equivalent FDFAs are carefully
constructed. The random DFAs are converted into DHA FDFA types and
the random FDFAs are compared with DHA FDFAs. A published non concept
lattice based algorithm producing an FDFA called D2FA is also shown
to perform well in all experiments. In the general context DHA performed
well though not as good as the D2FA algorithm. As a by-product of general
case FDFA tests, an algorithm for generating random FDFAs and a language
equivalent DFAs was proposed.Dissertation (MSc)--University of Pretoria, 2016.tm2016Computer ScienceMScUnrestricte
An Aho-Corasick based assessment of algorithms generating failure deterministic finite automata
The Aho-Corasick algorithm derives a failure deterministic finite automaton for finding matches of a finite set of keywords in a text. It has the minimum number of transitions needed for this task. The DFA-Homomorphic Algorithm (DHA) algorithm is more general, deriving from an arbitrary complete deterministic finite automaton a language-equivalent failure deterministic finite automaton. DHA takes formal concepts of a lattice as input. This lattice is built from a state/outtransition formal context that is derived from the complete deterministic finite automaton. In this paper, three general variants of the abstract DHA are benchmarked against the specialised Aho-Corasick algorithm. It is shown that when heuristics for these variants are suitably chosen, the minimality attained by the Aho-Corasick algorithm can be closely approximated. A published non-lattice-based algorithm is also shown to perform well in experiments
An Aho-Corasick based assessment of algorithms generating failure deterministic finite automata
\u3cp\u3eThe Aho-Corasick algorithm derives a failure deterministic finite automaton for finding matches of a finite set of keywords in a text. It has the minimum number of transitions needed for this task. The DFA-Homomorphic Algorithm (DHA) algorithm is more general, deriving from an arbitrary complete deterministic finite automaton a language-equivalent failure deterministic finite automaton. DHA takes formal concepts of a lattice as input. This lattice is built from a state/outtransition formal context that is derived from the complete deterministic finite automaton. In this paper, three general variants of the abstract DHA are benchmarked against the specialised Aho-Corasick algorithm. It is shown that when heuristics for these variants are suitably chosen, the minimality attained by the Aho-Corasick algorithm can be closely approximated. A published non-lattice-based algorithm is also shown to perform well in experiments.\u3c/p\u3