7 research outputs found

    Linear-time Minimization of Wheeler DFAs

    Get PDF
    Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of compressed data structures: as opposed to general automata, WDFAs can be stored in just log s + O(1) bits per edge, s being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization. When the input A is a general deterministic finite-state automaton (DFA), the state-of-the-art is represented by the classic Hopcroft's algorithm, which runs in O(vertical bar A vertical bar log vertical bar A vertical bar) time. This algorithm stands at the core of the only existing minimization algorithm for Wheeler DFAs, which inherits its complexity. In this work, we show that the minimum WDFA equivalent to a given input WDFA can be computed in linear O(vertical bar A vertical bar) time. When run on de Bruijn WDFAs built from real DNA datasets, an implementation of our algorithm reduces the number of nodes from 14% to 51% at a speed of more than 1 million nodes per second.Peer reviewe

    Minimization of Reactive Probabilistic Automata

    Get PDF
    The problem of finite automata minimization is important for software and hardware designing. Different types of automata are used for modeling systems or machines with finite number of states. The limitation of number of states gives savings in resources and time. In this article we show specific type of probabilistic automata: the reactive probabilistic finite automata with accepting states (in brief the reactive probabilistic automata), and definitions of languages accepted by it. We present definition of bisimulation relation for automata's states and define relation of indistinguishableness of automata states, on base of which we could effectuate automata minimization. Next we present detailed algorithm reactive probabilistic automata’s minimization with determination of its complexity and analyse example solved with help of this algorithm

    Analysing the Efficiency of Algorithms for Compiling Finite-State Morphologies

    Get PDF
    Äärellistilaiset morfologiat ovat tietokoneohjelmia, jotka mallintavat kielen sanojen rakennetta (morfologiaa) merkkijonopareja sisältävillä tietorakenteilla (äärellistilaisilla transduktoreilla). Äärellistilaisia morfologioita voidaan käyttää esimerkiksi hakuohjelmissa, jotka löytävät tekstistä kaikki annetun perusmuotoisen sanan esiintymät eri taivutusmuodoissaan. Äärellistilaiset morfologiat ovat myös hyödyllisiä, kun tekstistä tehdään tilastoja siitä kuinka usein kukin sana esiintyy ja missä taivutusmuodoissa. Äärellistilaisten morfologioiden rakentaminen on monimutkainen prosessi, johon kuuluu useita tehtäviä, joista yksi on transduktorin minimointi. Yleisiä minimointialgoritmeja ovat Brzozowskin (BRZ) ja Hopcroftin algoritmit (HOP). Kirjallisuudessa esiintyy väitteitä, joiden mukaan BRZ:n ja HOP:n välinen ero on merkityksettömän pieni morfologioita käännettäessä. Kuitenkaan BRZ:n suorituskykyä ei ole järjestelmällisesti testattu tai verrattu HOP:iin missään tutkimuksessa. Tässä diplomityössä käännettiin HFST-ohjelmistolla kaksi avoimen lähdekoodin morfologiaa, suomelle kirjoitettu OMorFi ja saksalle kirjoitettu Morphisto. HFST perustuu kahteen avoimen lähdekoodin transduktoriohjelmistopakettiin, SFST:hen ja OpenFst:hen, joista edellinen käyttää BRZ:ia ja jälkimmäinen HOP:ia minimointialgoritmina. BRZ osoittautui paljon hitaammaksi kuin HOP sekä suomen että saksan morfologioilla. BRZ:n hitaus oli ilmeistä transduktoreissa, jotka sisälsivät suuren mittakaavan syklisyyttä eli niissä oli siirtymiä, jotka johtivat lopputilojen läheisyydestä alkutilan läheisyyteen. Tällaisia transduktoreita esiintyy usein morfologioissa, joissa on yhdyssanamekanismi. Jos HOP:n ja BRZ:n välillä on valittava, edellinen on parempi vaihtoehto minimointi-algoritmiksi. BRZ on joskus nopeampi kuin HOP, mutta siinä tapauksessa algoritmien ero on melko pieni. Niissä tapauksissa joissa BRZ on hitaampi kuin HOP, ero on huomattavasti suurempi: BRZ on joskus jopa 50 kertaa hitaampi kuin HOP. BRZ on kuitenkin paljon helpompi toteuttaa, koska se perustuu kahteen perusoperaatioon, determinisointiin ja reversioon. Jos HOP:n toteuttaminen on liian vaativa tehtävä, avoimen lähdekoodin transduktorikirjaston kehittäjät voivat käyttää OpenFst:n minimointialgoritmia. Transduktorit voidaan muuntaa OpenFst:n muotoon, minimoida OpenFst:llä ja muuntaa takaisin alkuperäiseen muotoon. Tätä ratkaisua on tarkoitus käyttää myös HFST:n tulevissa versioissa.Finite-state morphologies (FSMs) are computer programs that model the structure of words in a language (morphology) with networks containing a number of string pairs (finite-state transducers). FSMs can be used e.g. to implement search programs that can find all forms of a word in a document if they are given only the base form. FSMs are also useful in compiling statistics on a text, i.e. finding out how often a word occurs and in which forms. Constructing FSMs is a complex process involving many tasks, one of which is transducer minimisation. Common minimisation algorithms include Brzozowski's (BRZ) and Hopcroft's algorithm (HOP). There have been claims in the literature that often the difference between BRZ and HOP is insignificant when compiling FSMs. However, no studies have been carried out where the performance of BRZ would have been systematically tested or compared with HOP. In this thesis, we compiled two open-source morphologies, OMorFi for Finnish and Morphisto for German, with the HFST software. HFST is based on two open-source transducer software packages, SFST and OpenFst, the former using BRZ and the latter HOP as a minimisation algorithm. BRZ turned out to be much slower than HOP both on Finnish and German morphologies. The slowness of BRZ was evident in transducers that contained large-scale cyclicity, i.e. had transitions leading from the nearness of the final states to the nearness of initial states. These kinds of transducers often occur in morphologies that have a compounding mechanism. If a choice must be made between HOP and BRZ, the previous is a better choice for a minimisation algorithm. BRZ is sometimes faster than HOP, but in that case their difference is quite small. In the cases where BRZ is slower than HOP, their difference is much bigger, BRZ sometimes being 50 times slower than HOP. Of course, BRZ is much easier to implement since it uses two basic operations, determinisation and reversion. If the implementation of HOP is considered too demanding a task, the developers of free-source transducer libraries can use OpenFst's minimisation algorithm. The transducers can be converted to OpenFst format, minimised with OpenFst and converted back to the original format. This solution will also be used in future versions of HFST

    Discrete Event Systems: Models and Applications; Proceedings of an IIASA Conference, Sopron, Hungary, August 3-7, 1987

    Get PDF
    Work in discrete event systems has just begun. There is a great deal of activity now, and much enthusiasm. There is considerable diversity reflecting differences in the intellectual formation of workers in the field and in the applications that guide their effort. This diversity is manifested in a proliferation of DEM formalisms. Some of the formalisms are essentially different. Some of the "new" formalisms are reinventions of existing formalisms presented in new terms. These "duplications" reveal both the new domains of intended application as well as the difficulty in keeping up with work that is published in journals on computer science, communications, signal processing, automatic control, and mathematical systems theory - to name the main disciplines with active research programs in discrete event systems. The first eight papers deal with models at the logical level, the next four are at the temporal level and the last six are at the stochastic level. Of these eighteen papers, three focus on manufacturing, four on communication networks, one on digital signal processing, the remaining ten papers address methodological issues ranging from simulation to computational complexity of some synthesis problems. The authors have made good efforts to make their contributions self-contained and to provide a representative bibliography. The volume should therefore be both accessible and useful to those who are just getting interested in discrete event systems

    On the complexity of Hopcroft's state minimization algorithm

    No full text
    International audienc

    Standard Sturmian words and automata minimization algorithms

    No full text
    The study of some close connections between the combinatorial properties of words and the performance of the automata minimization process constitutes the main focus of this paper. These relationships have been, in fact, the basis of the study of the tightness and the extremal cases of Hopcroft's algorithm, that is, up to now, the most efficient minimization method for deterministic finite state automata. Recently, increasing attention has been paid to another minimization method that, unlike the approach proposed by Hopcroft, is not based on refinement of the set of states of the automaton, but on automata operations such as determinization and reverse, and is also applicable to non-deterministic finite automata. However, even for deterministic automata, it was proved that the method incurs, almost surely, in an explosion of the number of the states. Very recently, some polynomial variants of Brzozowski's method have been introduced. In this paper, by using some combinatorial properties of words, we analyze the performance of one of such algorithms when applied to a particular infinite family of automata, called standard word automata, constructed by using standard sturmian words. In particular, Θ( nlog n) is the worst case time complexity when the algorithm is applied to the infinite family of word automata associated to Fibonacci words representing also the worst case of Hopcroft's minimization algorithm
    corecore