9 research outputs found

    Analysing the Efficiency of Algorithms for Compiling Finite-State Morphologies

    Get PDF
    Äärellistilaiset morfologiat ovat tietokoneohjelmia, jotka mallintavat kielen sanojen rakennetta (morfologiaa) merkkijonopareja sisältävillä tietorakenteilla (äärellistilaisilla transduktoreilla). Äärellistilaisia morfologioita voidaan käyttää esimerkiksi hakuohjelmissa, jotka löytävät tekstistä kaikki annetun perusmuotoisen sanan esiintymät eri taivutusmuodoissaan. Äärellistilaiset morfologiat ovat myös hyödyllisiä, kun tekstistä tehdään tilastoja siitä kuinka usein kukin sana esiintyy ja missä taivutusmuodoissa. Äärellistilaisten morfologioiden rakentaminen on monimutkainen prosessi, johon kuuluu useita tehtäviä, joista yksi on transduktorin minimointi. Yleisiä minimointialgoritmeja ovat Brzozowskin (BRZ) ja Hopcroftin algoritmit (HOP). Kirjallisuudessa esiintyy väitteitä, joiden mukaan BRZ:n ja HOP:n välinen ero on merkityksettömän pieni morfologioita käännettäessä. Kuitenkaan BRZ:n suorituskykyä ei ole järjestelmällisesti testattu tai verrattu HOP:iin missään tutkimuksessa. Tässä diplomityössä käännettiin HFST-ohjelmistolla kaksi avoimen lähdekoodin morfologiaa, suomelle kirjoitettu OMorFi ja saksalle kirjoitettu Morphisto. HFST perustuu kahteen avoimen lähdekoodin transduktoriohjelmistopakettiin, SFST:hen ja OpenFst:hen, joista edellinen käyttää BRZ:ia ja jälkimmäinen HOP:ia minimointialgoritmina. BRZ osoittautui paljon hitaammaksi kuin HOP sekä suomen että saksan morfologioilla. BRZ:n hitaus oli ilmeistä transduktoreissa, jotka sisälsivät suuren mittakaavan syklisyyttä eli niissä oli siirtymiä, jotka johtivat lopputilojen läheisyydestä alkutilan läheisyyteen. Tällaisia transduktoreita esiintyy usein morfologioissa, joissa on yhdyssanamekanismi. Jos HOP:n ja BRZ:n välillä on valittava, edellinen on parempi vaihtoehto minimointi-algoritmiksi. BRZ on joskus nopeampi kuin HOP, mutta siinä tapauksessa algoritmien ero on melko pieni. Niissä tapauksissa joissa BRZ on hitaampi kuin HOP, ero on huomattavasti suurempi: BRZ on joskus jopa 50 kertaa hitaampi kuin HOP. BRZ on kuitenkin paljon helpompi toteuttaa, koska se perustuu kahteen perusoperaatioon, determinisointiin ja reversioon. Jos HOP:n toteuttaminen on liian vaativa tehtävä, avoimen lähdekoodin transduktorikirjaston kehittäjät voivat käyttää OpenFst:n minimointialgoritmia. Transduktorit voidaan muuntaa OpenFst:n muotoon, minimoida OpenFst:llä ja muuntaa takaisin alkuperäiseen muotoon. Tätä ratkaisua on tarkoitus käyttää myös HFST:n tulevissa versioissa.Finite-state morphologies (FSMs) are computer programs that model the structure of words in a language (morphology) with networks containing a number of string pairs (finite-state transducers). FSMs can be used e.g. to implement search programs that can find all forms of a word in a document if they are given only the base form. FSMs are also useful in compiling statistics on a text, i.e. finding out how often a word occurs and in which forms. Constructing FSMs is a complex process involving many tasks, one of which is transducer minimisation. Common minimisation algorithms include Brzozowski's (BRZ) and Hopcroft's algorithm (HOP). There have been claims in the literature that often the difference between BRZ and HOP is insignificant when compiling FSMs. However, no studies have been carried out where the performance of BRZ would have been systematically tested or compared with HOP. In this thesis, we compiled two open-source morphologies, OMorFi for Finnish and Morphisto for German, with the HFST software. HFST is based on two open-source transducer software packages, SFST and OpenFst, the former using BRZ and the latter HOP as a minimisation algorithm. BRZ turned out to be much slower than HOP both on Finnish and German morphologies. The slowness of BRZ was evident in transducers that contained large-scale cyclicity, i.e. had transitions leading from the nearness of the final states to the nearness of initial states. These kinds of transducers often occur in morphologies that have a compounding mechanism. If a choice must be made between HOP and BRZ, the previous is a better choice for a minimisation algorithm. BRZ is sometimes faster than HOP, but in that case their difference is quite small. In the cases where BRZ is slower than HOP, their difference is much bigger, BRZ sometimes being 50 times slower than HOP. Of course, BRZ is much easier to implement since it uses two basic operations, determinisation and reversion. If the implementation of HOP is considered too demanding a task, the developers of free-source transducer libraries can use OpenFst's minimisation algorithm. The transducers can be converted to OpenFst format, minimised with OpenFst and converted back to the original format. This solution will also be used in future versions of HFST

    Efficient deterministic finite automata split-minimization derived from Brzozowski's algorithm

    Full text link
    Minimization of deterministic finite automata is a classic problem in Computer Science which is still studied nowadays. In this paper, we relate the different split-minimization methods proposed to date, or to be proposed, and the algorithm due to Brzozowski which has been usually set aside in any classification of DFA minimization algorithms. In our work, we first propose a polynomial minimization method derived from a paper by Champarnaud et al. We also show how the consideration of some efficiency improvements on this algorithm lead to obtain an algorithm similar to Hopcroft s classic algorithm. The results obtained lead us to propose a characterization of the set of possible splitters.García Gómez, P.; López Rodríguez, D.; Vázquez-De-Parga Andrade, M. (2014). Efficient deterministic finite automata split-minimization derived from Brzozowski's algorithm. International Journal of Foundations of Computer Science. 25(6):679-696. doi:10.1142/S0129054114500282S679696256Vázquez de Parga, M., García, P., & López, D. (2013). A polynomial double reversal minimization algorithm for deterministic finite automata. Theoretical Computer Science, 487, 17-22. doi:10.1016/j.tcs.2013.03.005Courcelle, B., Niwinski, D., & Podelski, A. (1991). A geometrical view of the determinization and minimization of finite-state automata. Mathematical Systems Theory, 24(1), 117-146. doi:10.1007/bf02090394POLÁK, L. (2005). MINIMALIZATIONS OF NFA USING THE UNIVERSAL AUTOMATON. International Journal of Foundations of Computer Science, 16(05), 999-1010. doi:10.1142/s0129054105003431Gries, D. (1973). Describing an algorithm by Hopcroft. Acta Informatica, 2(2). doi:10.1007/bf00264025Blum, N. (1996). An O(n log n) implementation of the standard method for minimizing n-state finite automata. Information Processing Letters, 57(2), 65-69. doi:10.1016/0020-0190(95)00199-9Knuutila, T. (2001). Re-describing an algorithm by Hopcroft. Theoretical Computer Science, 250(1-2), 333-363. doi:10.1016/s0304-3975(99)00150-

    Automates codéterministes et automates acycliques : analyse d'algorithmes et génération aléatoire

    Get PDF
    The general context of this thesis is the quantitative analysis of objects coming from rational language theory. We adapt techniques from the field of analysis of algorithms (average-case complexity, generic complexity, random generation...) to objects and algorithms that involve particular classes of automata. In a first part we study the complexity of Brzozowski's minimisation algorithm. Although the worst-case complexity of this algorithm is bad, it is known to be efficient in practice. Using typical properties of random mappings and random permutations, we show that the generic complexityof Brzozowski's algorithm grows faster than any polynomial in n, where n is the number of states of the automaton. In a second part, we study the random generation of acyclic automata. These automata recognize the finite sets of words, and for this reason they are widely use in applications, especially in natural language processing. We present two random generators, one using a model of Markov chain, the other a ``recursive method", based on a cominatorics decomposition of structures. The first method can be applied in many situations cases but is very difficult to calibrate, the second method is more efficient. Once implemented, this second method allows to observe typical properties of acyclic automata of large sizeLe cadre générale de cette thèse est l'analyse quantitative des objets issus de la théorie des langages rationnels. On adapte des techniques d'analyse d'algorithmes (complexité en moyenne, complexité générique, génération aléatoire, ...) à des objets et à des algorithmes qui font intervenir des classes particulières d'automates. Dans une première partie nous étudions la complexité de l'algorithme de minimisation de Brzozowski. Bien qu'ayant une mauvaise complexité dans le pire des cas, cet algorithme a la réputation d'être efficace en pratique. En utilisant les propriétés typiques des applications et des permutations aléatoires, nous montrons que la complexité générique de l'algorithme de Brzozowski appliqué à un automate déterministe croît plus vite que tout polynôme en n, où n est le nombre d'états de l'automate. Dans une seconde partie nous nous intéressons à la génération aléatoire d'automates acycliques. Ces automates sont ceux qui reconnaissent les ensembles finis de mots et sont de ce fait utilisés dans de nombreuses applications, notamment en traitement automatique des langues. Nous proposons deux générateurs aléatoires. Le premier utilise le modèle des chaînes de Markov, et le second utilise la "méthode récursive", qui tire partie des décompositions combinatoires des objets pour faire de la génération. La première méthode est souple mais difficile à calibrer, la seconde s'avère plutôt efficace. Une fois implantée, cette dernière nous a notamment permis d'observer les propriétés typiques des grands automates acycliques aléatoire

    Temporal constraint reasoning in microprocessor systems diagnosis.

    Get PDF
    by Yuen Siu Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 104-110).Chapter 1 --- Introduction --- p.1Chapter 2 --- Background --- p.4Chapter 2.1 --- Approaches in Formal Hardware Verification --- p.4Chapter 2.1.1 --- Theorem Proving --- p.5Chapter 2.1.2 --- Symbolic Simulation --- p.5Chapter 2.1.3 --- Model Checking --- p.6Chapter 2.2 --- Temporal Theories --- p.7Chapter 2.3 --- Related Works --- p.8Chapter 2.3.1 --- Consistency and Satisfiability of Timing Specifications --- p.8Chapter 2.3.2 --- Symbolic Constraint Satisfaction --- p.9Chapter 3 --- Problem Domain --- p.11Chapter 3.1 --- Basics of MC68000 Read Cycle --- p.11Chapter 4 --- Knowledge-based System Structure --- p.13Chapter 4.1 --- Diagnostic Reasoning Mechanisms --- p.14Chapter 4.2 --- Occurring Event Sequence --- p.16Chapter 4.3 --- Equivalent Goals --- p.17Chapter 4.4 --- CPU Databus Setup Time --- p.17Chapter 4.5 --- Assertion of CPU AS Signal --- p.19Chapter 5 --- Time Range Approach --- p.21Chapter 5.1 --- Time Range Represent ation --- p.21Chapter 5.2 --- Time Ranges Reasoning Techniques --- p.22Chapter 5.2.1 --- Constraint Satisfaction of Time Ranges --- p.22Chapter 5.2.2 --- Constraint Propagation of Time Ranges --- p.25Chapter 5.3 --- Worst-Case Timing Analysis --- p.28Chapter 5.4 --- System Implementation --- p.29Chapter 5.4.1 --- CPU Databus Setup Time --- p.30Chapter 5.4.2 --- Assertion of CPU AS Signal --- p.36Chapter 5.5 --- Implementation Results --- p.40Chapter 5.5.1 --- CPU Databus Setup Time --- p.40Chapter 5.5.2 --- Assertion of CPU AS Signal --- p.40Chapter 5.6 --- Conclusion --- p.41Chapter 6 --- Fuzzy Time Point Approach --- p.43Chapter 6.1 --- Fuzzy Time Point Models --- p.44Chapter 6.1.1 --- Concept of Fuzzy Numbers --- p.44Chapter 6.1.2 --- Definition of Fuzzy Time Points --- p.45Chapter 6.1.3 --- Semi-bounded Fuzzy Time Points --- p.47Chapter 6.2 --- Fuzzy Time Point Reasoning Techniques --- p.48Chapter 6.2.1 --- Constraint Propagation of Fuzzy Time Points --- p.50Chapter 6.2.2 --- Constraint Satisfaction of Fuzzy Time Points --- p.52Chapter 6.3 --- System Implementation --- p.55Chapter 6.3.1 --- Representation of Fuzzy Time Point --- p.55Chapter 6.3.2 --- Fuzzy Time Point Satisfaction --- p.56Chapter 6.3.3 --- Fuzzy Time Point Propagation --- p.58Chapter 6.4 --- Implementation Results --- p.64Chapter 6.4.1 --- CPU Databus Setup Time --- p.64Chapter 6.4.2 --- Assertion of CPU AS Signal --- p.65Chapter 6.5 --- Fuzzy Time Point Model Parameters --- p.66Chapter 6.5.1 --- Variation of Semi-bounded ftps' Membership Function --- p.66Chapter 6.5.2 --- Variation of μftp --- p.67Chapter 6.5.3 --- Variation of K --- p.69Chapter 6.6 --- Conclusion --- p.69Chapter 7 --- Constraint Compatibility Reasoning --- p.72Chapter 7.1 --- Abstract Timing Parameters --- p.73Chapter 7.2 --- MC68000 Read Cycle: Wait States Insertion --- p.75Chapter 7.3 --- Constraint Compatibility of Fuzzy Time Point --- p.75Chapter 7.3.1 --- Crisp Threshold Value --- p.77Chapter 7.3.2 --- Possibility Quantification for the Number of Wait States --- p.78Chapter 7.3.3 --- Threshold Beyond Fuzzy Time Point --- p.80Chapter 7.3.4 --- Fuzzy Time Point Beyond Threshold --- p.80Chapter 7.3.5 --- Threshold Within Fuzzy Time Point --- p.82Chapter 7.4 --- Determine When CPU Clock State is S5 --- p.83Chapter 7.5 --- System Implementation --- p.84Chapter 7.5.1 --- Expert's Heuristic Rule --- p.84Chapter 7.5.2 --- Constraint Compatibility --- p.85Chapter 7.5.3 --- Wait States Insertion --- p.87Chapter 7.6 --- Implementation Results --- p.91Chapter 7.7 --- Conclusion --- p.93Chapter 8 --- Conclusion --- p.95Chapter 8.1 --- Applications in Other Domains --- p.97Chapter 8.2 --- Future Directions and Recommendations --- p.98Chapter A --- Constraint Compatibility Reasoning Output --- p.99Chapter A.1 --- No Wait Cycle Insertion --- p.99Chapter A.2 --- Single Wait Cycle Insertion --- p.100Chapter A.3 --- Two Wait Cycle Insertions --- p.100Chapter B --- MC68020 Read Cycle Problem --- p.101Chapter B.1 --- Basics of MC68020 Read Cycle --- p.101Chapter B.2 --- MC68020 Databus Setup Time --- p.102Chapter B.3 --- Implementation Results --- p.103Bibliography --- p.10

    Synchronous Programming of Reactive Systems

    Full text link

    Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

    Get PDF
    A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC

    Average Case Analysis of Brzozowski's Algorithm

    Get PDF
    International audienceWe analyze the average complexity of Brzozowski's minimization algorithm for distributions of deterministic automata with a small number of final states. We show that, as in the case of the uniform distribution, the average complexity is super-polynomial even if we consider random deterministic automata with only one final state. Such results were only known for distributions where the expected number of final states was linear in the number of states

    codeterministic automata and acyclic automata : analysis of algorithmes and random generation

    No full text
    Le cadre générale de cette thèse est l'analyse quantitative des objets issus de la théorie des langages rationnels. On adapte des techniques d'analyse d'algorithmes (complexité en moyenne, complexité générique, génération aléatoire, ...) à des objets et à des algorithmes qui font intervenir des classes particulières d'automates. Dans une première partie nous étudions la complexité de l'algorithme de minimisation de Brzozowski. Bien qu'ayant une mauvaise complexité dans le pire des cas, cet algorithme a la réputation d'être efficace en pratique. En utilisant les propriétés typiques des applications et des permutations aléatoires, nous montrons que la complexité générique de l'algorithme de Brzozowski appliqué à un automate déterministe croît plus vite que tout polynôme en n, où n est le nombre d'états de l'automate. Dans une seconde partie nous nous intéressons à la génération aléatoire d'automates acycliques. Ces automates sont ceux qui reconnaissent les ensembles finis de mots et sont de ce fait utilisés dans de nombreuses applications, notamment en traitement automatique des langues. Nous proposons deux générateurs aléatoires. Le premier utilise le modèle des chaînes de Markov, et le second utilise la "méthode récursive", qui tire partie des décompositions combinatoires des objets pour faire de la génération. La première méthode est souple mais difficile à calibrer, la seconde s'avère plutôt efficace. Une fois implantée, cette dernière nous a notamment permis d'observer les propriétés typiques des grands automates acycliques aléatoiresThe general context of this thesis is the quantitative analysis of objects coming from rational language theory. We adapt techniques from the field of analysis of algorithms (average-case complexity, generic complexity, random generation...) to objects and algorithms that involve particular classes of automata. In a first part we study the complexity of Brzozowski's minimisation algorithm. Although the worst-case complexity of this algorithm is bad, it is known to be efficient in practice. Using typical properties of random mappings and random permutations, we show that the generic complexityof Brzozowski's algorithm grows faster than any polynomial in n, where n is the number of states of the automaton. In a second part, we study the random generation of acyclic automata. These automata recognize the finite sets of words, and for this reason they are widely use in applications, especially in natural language processing. We present two random generators, one using a model of Markov chain, the other a ``recursive method", based on a cominatorics decomposition of structures. The first method can be applied in many situations cases but is very difficult to calibrate, the second method is more efficient. Once implemented, this second method allows to observe typical properties of acyclic automata of large siz
    corecore