7,169 research outputs found

    Efficient Online Timed Pattern Matching by Automata-Based Skipping

    Full text link
    The timed pattern matching problem is an actively studied topic because of its relevance in monitoring of real-time systems. There one is given a log ww and a specification A\mathcal{A} (given by a timed word and a timed automaton in this paper), and one wishes to return the set of intervals for which the log ww, when restricted to the interval, satisfies the specification A\mathcal{A}. In our previous work we presented an efficient timed pattern matching algorithm: it adopts a skipping mechanism inspired by the classic Boyer--Moore (BM) string matching algorithm. In this work we tackle the problem of online timed pattern matching, towards embedded applications where it is vital to process a vast amount of incoming data in a timely manner. Specifically, we start with the Franek-Jennings-Smyth (FJS) string matching algorithm---a recent variant of the BM algorithm---and extend it to timed pattern matching. Our experiments indicate the efficiency of our FJS-type algorithm in online and offline timed pattern matching

    Analyzing large-scale DNA Sequences on Multi-core Architectures

    Full text link
    Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201

    Temiar Reduplication in One-Level Prosodic Morphology

    Full text link
    Temiar reduplication is a difficult piece of prosodic morphology. This paper presents the first computational analysis of Temiar reduplication, using the novel finite-state approach of One-Level Prosodic Morphology originally developed by Walther (1999b, 2000). After reviewing both the data and the basic tenets of One-level Prosodic Morphology, the analysis is laid out in some detail, using the notation of the FSA Utilities finite-state toolkit (van Noord 1997). One important discovery is that in this approach one can easily define a regular expression operator which ambiguously scans a string in the left- or rightward direction for a certain prosodic property. This yields an elegant account of base-length-dependent triggering of reduplication as found in Temiar.Comment: 9 pages, 2 figures. Finite-State Phonology: SIGPHON-2000, Proceedings of the Fifth Workshop of the ACL Special Interest Group in Computational Phonology, pp.13-21. Aug. 6, 2000. Luxembour

    Regular Languages meet Prefix Sorting

    Full text link
    Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with nn states admits an equivalent Wheeler DFA (WDFA) with at most 2n1Σ2n-1-|\Sigma| states that can be computed in O(n3)O(n^3) time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a O(nlogn)O(n\log n)-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in O(nlogn)O(n\log n) time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin
    corecore