2,063 research outputs found
Streamability of nested word transductions
We consider the problem of evaluating in streaming (i.e., in a single
left-to-right pass) a nested word transduction with a limited amount of memory.
A transduction T is said to be height bounded memory (HBM) if it can be
evaluated with a memory that depends only on the size of T and on the height of
the input word. We show that it is decidable in coNPTime for a nested word
transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In
this case, the required amount of memory may depend exponentially on the height
of the word. We exhibit a sufficient, decidable condition for a VPT to be
evaluated with a memory that depends quadratically on the height of the word.
This condition defines a class of transductions that strictly contains all
determinizable VPTs
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Streaming Property Testing of Visibly Pushdown Languages
In the context of language recognition, we demonstrate the superiority of
streaming property testers against streaming algorithms and property testers,
when they are not combined. Initiated by Feigenbaum et al., a streaming
property tester is a streaming algorithm recognizing a language under the
property testing approximation: it must distinguish inputs of the language from
those that are -far from it, while using the smallest possible
memory (rather than limiting its number of input queries).
Our main result is a streaming -property tester for visibly
pushdown languages (VPL) with one-sided error using memory space
.
This constructions relies on a (non-streaming) property tester for weighted
regular languages based on a previous tester by Alon et al. We provide a simple
application of this tester for streaming testing special cases of instances of
VPL that are already hard for both streaming algorithms and property testers.
Our main algorithm is a combination of an original simulation of visibly
pushdown automata using a stack with small height but possible items of linear
size. In a second step, those items are replaced by small sketches. Those
sketches relies on a notion of suffix-sampling we introduce. This sampling is
the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio
Recognizing well-parenthesized expressions in the streaming model
Motivated by a concrete problem and with the goal of understanding the sense
in which the complexity of streaming algorithms is related to the complexity of
formal languages, we investigate the problem Dyck(s) of checking matching
parentheses, with different types of parenthesis.
We present a one-pass randomized streaming algorithm for Dyck(2) with space
\Order(\sqrt{n}\log n), time per letter \polylog (n), and one-sided error.
We prove that this one-pass algorithm is optimal, up to a \polylog n factor,
even when two-sided error is allowed. For the lower bound, we prove a direct
sum result on hard instances by following the "information cost" approach, but
with a few twists. Indeed, we play a subtle game between public and private
coins. This mixture between public and private coins results from a balancing
act between the direct sum result and a combinatorial lower bound for the base
case.
Surprisingly, the space requirement shrinks drastically if we have access to
the input stream in reverse. We present a two-pass randomized streaming
algorithm for Dyck(2) with space \Order((\log n)^2), time \polylog (n) and
one-sided error, where the second pass is in the reverse direction. Both
algorithms can be extended to Dyck(s) since this problem is reducible to
Dyck(2) for a suitable notion of reduction in the streaming model.Comment: 20 pages, 5 figure
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
An MPEG-7 scheme for semantic content modelling and filtering of digital video
Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users
Streaming Property Testing of Visibly Pushdown Languages
In the context of formal language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are eps-far from it, while using the smallest possible memory (rather than limiting its number of input queries). Our main result is a streaming eps-property tester for visibly pushdown languages (V_{PL}) with memory space poly(log n /epsilon).
Our construction is done in three steps. First, we simulate a visibly pushdown automaton in one pass using a stack of small height but whose items can be of linear size. In a second step, those items are replaced by small sketches. Those sketches rely on a notion of suffix-sampling we introduce. This sampling is the key idea for taking benefit of both streaming algorithms and property testers in the third step. Indeed, the last step relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al. This tester can directly be used for streaming testing special cases of instances of V_{PL} that are already hard for both streaming algorithms and property testers. We then use it to decide the correctness of completed items, given their sketches, before removing them from the stack
Streaming Tree Automata
International audienceStreaming validation and querying of XML documents are often based on automata for tree-like structures. We propose a new notion of streaming tree automata in order to unify the two main approaches, which have not been linked so far: automata for nested words or equivalently visibly pushdown automata, and respectively pushdown forest automata
- …