351 research outputs found

    A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

    Get PDF
    International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

    Open Source Natural Language Processing

    Get PDF
    Our MQP aimed to introduce finite state machine based techniques for natural language processing into Hunspell, the world\u27s premiere Open Source spell checker used in several prominent projects such as Firefox and Open Office. We created compact machine-readable finite state transducer representations of 26 of the most commonly used languages on Wikipedia. We then created an automata based spell checker. In addition, we implemented an transducer based stemmer, which will be used in the future of transducer based morphological analysis

    Fast and Compact Regular Expression Matching

    Get PDF
    We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

    State Estimation of Timed Discrete Event Systems and Its Applications

    Get PDF
    Many industrial control systems can be described as discrete event systems (DES), whose state space is a discrete set where event occurrences cause transitions from one state to another. Timing introduces an additional dimension to DES modeling and control. This dissertation provides two models of timed DES endowed with a single clock, namely timed finite automata (TFA) and generalized timed finite automata (GTFA). In addition, a timing function is defined to associate each transition with a time interval specifying at which clock values it may occur. While the clock of a TFA is reset to zero after each event occurs and the time semantics constrain the dwell time at each discrete state, there is an additional clock resetting function associated with a GTFA to denote whether the clock is reset to a value in a given closed time interval. We assume that the logical and time structure of a partially observable TFA/GTFA is known. The main results are summarized as follows. 1. The notion of a zone automaton is introduced as a finite automaton providing a purely discrete event description of the behaviour of a TFA/GTFA of interest. Each state of a zone automaton contains a discrete state of the timed DES and a zone that is a time interval denoting a range of possible clock values. We investigate the dynamics of a zone automaton and show that one can reduce the problem of investigating the reachability of a given timed DES to the reachability analysis of a zone automaton. 2. We present a formal approach that allows one to construct offline an observer for TFA/GTFA, i.e., a finite structure that describes the state estimation for all possible evolutions. During the online phase to estimate the current discrete state according to each measurement of an observable event, one can determine which is the state of the observer reached by the current observation and check to which interval (among a finite number of time intervals) the time elapsed since the last observed event occurrence belongs. We prove that the discrete states consistent with a timed observation and the range of clock values associated with each estimated discrete state can be inferred following a certain number of runs in the zone automaton. In particular, the state estimation of timed DES under multiple clocks can be investigated in the framework of GTFA. We model such a system as a GTFA with multiple clocks, which generalizes the timing function and the clock resetting function to multiple clocks. 3. As an application of the state estimation approach for TFA, we assume that a given TFA may be affected by a set of faults described using timed transitions and aim at diagnosing a fault behaviour based on a timed observation. The problem of fault diagnosis is solved by constructing a zone automaton of the TFA with faults and a fault recognizer as the parallel composition of the zone automaton and a fault monitor that recognizes the occurrence of faults. We conclude that the occurrence of faults can be analyzed by exploring runs in the fault recognizer that are consistent with a given timed observation. 4. We also study the problem of attack detection in the context of DESs, assuming that a system may be subject to multiple types of attacks, each described by its own attack dictionary. Furthermore, we distinguish between constant attacks, which corrupt observations using only one of the attack dictionaries, and switching attacks, which may use different attack dictionaries at different steps. The problem we address is detecting whether a system has been attacked and, if so, which attack dictionaries have been used. To solve it in the framework of untimed DES, we construct a new structure that describes the observations generated by a system under attack. We show that the attack detection problem can be transformed into a classical state estimation/diagnosis problem for these new structures

    Proceedings of the Eindhoven FASTAR Days 2004 : Eindhoven, The Netherlands, September 3-4, 2004

    Get PDF
    The Eindhoven FASTAR Days (EFD) 2004 were organized by the Software Construction group of the Department of Mathematics and Computer Science at the Technische Universiteit Eindhoven. On September 3rd and 4th 2004, over thirty participants|hailing from the Czech Republic, Finland, France, The Netherlands, Poland and South Africa|gathered at the Department to attend the EFD. The EFD were organized in connection with the research on finite automata by the FASTAR Research Group, which is centered in Eindhoven and at the University of Pretoria, South Africa. FASTAR (Finite Automata Systems|Theoretical and Applied Research) is an in- ternational research group that aims to lead in all areas related to finite state systems. The work in FASTAR includes both core and applied parts of this field. The EFD therefore focused on the field of finite automata, with an emphasis on practical aspects and applications. Eighteen presentations, mostly on subjects within this field, were given, by researchers as well as students from participating universities and industrial research facilities. This report contains the proceedings of the conference, in the form of papers for twelve of the presentations at the EFD. Most of them were initially reviewed and distributed as handouts during the EFD. After the EFD took place, the papers were revised for publication in these proceedings. We would like to thank the participants for their attendance and presentations, making the EFD 2004 as successful as they were. Based on this success, it is our intention to make the EFD into a recurring event. Eindhoven, December 2004 Loek Cleophas Bruce W. Watso
    • …
    corecore