7,169 research outputs found
Efficient Online Timed Pattern Matching by Automata-Based Skipping
The timed pattern matching problem is an actively studied topic because of
its relevance in monitoring of real-time systems. There one is given a log
and a specification (given by a timed word and a timed automaton
in this paper), and one wishes to return the set of intervals for which the log
, when restricted to the interval, satisfies the specification
. In our previous work we presented an efficient timed pattern
matching algorithm: it adopts a skipping mechanism inspired by the classic
Boyer--Moore (BM) string matching algorithm. In this work we tackle the problem
of online timed pattern matching, towards embedded applications where it is
vital to process a vast amount of incoming data in a timely manner.
Specifically, we start with the Franek-Jennings-Smyth (FJS) string matching
algorithm---a recent variant of the BM algorithm---and extend it to timed
pattern matching. Our experiments indicate the efficiency of our FJS-type
algorithm in online and offline timed pattern matching
Analyzing large-scale DNA Sequences on Multi-core Architectures
Rapid analysis of DNA sequences is important in preventing the evolution of
different viruses and bacteria during an early phase, early diagnosis of
genetic predispositions to certain diseases (cancer, cardiovascular diseases),
and in DNA forensics. However, real-world DNA sequences may comprise several
Gigabytes and the process of DNA analysis demands adequate computational
resources to be completed within a reasonable time. In this paper we present a
scalable approach for parallel DNA analysis that is based on Finite Automata,
and which is suitable for analyzing very large DNA segments. We evaluate our
approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog
(2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results
on a dual-socket shared-memory system with 24 physical cores show speed-ups of
up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel
approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and
Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201
Temiar Reduplication in One-Level Prosodic Morphology
Temiar reduplication is a difficult piece of prosodic morphology. This paper
presents the first computational analysis of Temiar reduplication, using the
novel finite-state approach of One-Level Prosodic Morphology originally
developed by Walther (1999b, 2000). After reviewing both the data and the basic
tenets of One-level Prosodic Morphology, the analysis is laid out in some
detail, using the notation of the FSA Utilities finite-state toolkit (van Noord
1997). One important discovery is that in this approach one can easily define a
regular expression operator which ambiguously scans a string in the left- or
rightward direction for a certain prosodic property. This yields an elegant
account of base-length-dependent triggering of reduplication as found in
Temiar.Comment: 9 pages, 2 figures. Finite-State Phonology: SIGPHON-2000, Proceedings
of the Fifth Workshop of the ACL Special Interest Group in Computational
Phonology, pp.13-21. Aug. 6, 2000. Luxembour
Regular Languages meet Prefix Sorting
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most
successful algorithmic techniques developed in the last decades. Can indexing
be extended to languages? The main contribution of this paper is to initiate
the study of the sub-class of regular languages accepted by an automaton whose
states can be prefix-sorted. Starting from the recent notion of Wheeler graph
[Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting
to labeled graphs-we investigate the properties of Wheeler languages, that is,
regular languages admitting an accepting Wheeler finite automaton.
Interestingly, we characterize this family as the natural extension of regular
languages endowed with the co-lexicographic ordering: when sorted, the strings
belonging to a Wheeler language are partitioned into a finite number of
co-lexicographic intervals, each formed by elements from a single Myhill-Nerode
equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with
states admits an equivalent Wheeler DFA (WDFA) with at most
states that can be computed in time. This is in sharp contrast with
general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper
superset of the WDFAs, a -time online algorithm to sort acyclic
WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By
contribution (i), our algorithms can also be used to index any WNFA at the
moderate price of doubling the automaton's size. (iii) We provide a
minimization theorem that characterizes the smallest WDFA recognizing the same
language of any input WDFA. The corresponding constructive algorithm runs in
optimal linear time in the acyclic case, and in time in the
general case. (iv) We show how to compute the smallest WDFA equivalent to any
acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version
with new results (W-MH theorem, linear determinization), added author:
Giovanna D'Agostin
- …