3,590 research outputs found
Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching
Automata play important roles in wide area of computing and the growth of
multicores calls for their efficient parallel implementation. Though it is
known in theory that we can perform the computation of a finite automaton in
parallel by simulating transitions, its implementation has a large overhead due
to the simulation. In this paper we propose a new automaton called simultaneous
finite automaton (SFA) for efficient parallel computation of an automaton. The
key idea is to extend an automaton so that it involves the simulation of
transitions. Since an SFA itself has a good property of parallelism, we can
develop easily a parallel implementation without overheads. We have implemented
a regular expression matcher based on SFA, and it has achieved over 10-times
speedups on an environment with dual hexa-core CPUs in a typical case.Comment: This paper has been accepted at the following conference: 2013
International Conference on Parallel Processing (ICPP- 2013), October 1-4,
2013 Ecole Normale Suprieure de Lyon, Lyon, Franc
Decision Problems for Deterministic Pushdown Automata on Infinite Words
The article surveys some decidability results for DPDAs on infinite words
(omega-DPDA). We summarize some recent results on the decidability of the
regularity and the equivalence problem for the class of weak omega-DPDAs.
Furthermore, we present some new results on the parity index problem for
omega-DPDAs. For the specification of a parity condition, the states of the
omega-DPDA are assigned priorities (natural numbers), and a run is accepting if
the highest priority that appears infinitely often during a run is even. The
basic simplification question asks whether one can determine the minimal number
of priorities that are needed to accept the language of a given omega-DPDA. We
provide some decidability results on variations of this question for some
classes of omega-DPDAs.Comment: In Proceedings AFL 2014, arXiv:1405.527
Analyzing large-scale DNA Sequences on Multi-core Architectures
Rapid analysis of DNA sequences is important in preventing the evolution of
different viruses and bacteria during an early phase, early diagnosis of
genetic predispositions to certain diseases (cancer, cardiovascular diseases),
and in DNA forensics. However, real-world DNA sequences may comprise several
Gigabytes and the process of DNA analysis demands adequate computational
resources to be completed within a reasonable time. In this paper we present a
scalable approach for parallel DNA analysis that is based on Finite Automata,
and which is suitable for analyzing very large DNA segments. We evaluate our
approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog
(2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results
on a dual-socket shared-memory system with 24 physical cores show speed-ups of
up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel
approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and
Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Streamability of nested word transductions
We consider the problem of evaluating in streaming (i.e., in a single
left-to-right pass) a nested word transduction with a limited amount of memory.
A transduction T is said to be height bounded memory (HBM) if it can be
evaluated with a memory that depends only on the size of T and on the height of
the input word. We show that it is decidable in coNPTime for a nested word
transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In
this case, the required amount of memory may depend exponentially on the height
of the word. We exhibit a sufficient, decidable condition for a VPT to be
evaluated with a memory that depends quadratically on the height of the word.
This condition defines a class of transductions that strictly contains all
determinizable VPTs
Memoization for Unary Logic Programming: Characterizing PTIME
We give a characterization of deterministic polynomial time computation based
on an algebraic structure called the resolution semiring, whose elements can be
understood as logic programs or sets of rewriting rules over first-order terms.
More precisely, we study the restriction of this framework to terms (and logic
programs, rewriting rules) using only unary symbols. We prove it is complete
for polynomial time computation, using an encoding of pushdown automata. We
then introduce an algebraic counterpart of the memoization technique in order
to show its PTIME soundness. We finally relate our approach and complexity
results to complexity of logic programming. As an application of our
techniques, we show a PTIME-completeness result for a class of logic
programming queries which use only unary function symbols.Comment: Soumis {\`a} LICS 201
Automata learning algorithms and processes for providing more complete systems requirements specification by scenario generation, CSP-based syntax-oriented model construction, and R2D2C system requirements transformation
Systems, methods and apparatus are provided through which in some embodiments, automata learning algorithms and techniques are implemented to generate a more complete set of scenarios for requirements based programming. More specifically, a CSP-based, syntax-oriented model construction, which requires the support of a theorem prover, is complemented by model extrapolation, via automata learning. This may support the systematic completion of the requirements, the nature of the requirement being partial, which provides focus on the most prominent scenarios. This may generalize requirement skeletons by extrapolation and may indicate by way of automatically generated traces where the requirement specification is too loose and additional information is required
- …