2,292 research outputs found
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
Entanglement and Symmetry: A Case Study in Superselection Rules, Reference Frames, and Beyond
This paper concentrates on a particular example of a constraint imposed by
superselection rules (SSRs): that which applies when the parties (Alice and
Bob) cannot distinguish among certain quantum objects they have. This arises
naturally in the context of ensemble quantum information processing such as in
liquid NMR. We discuss how a SSR for the symmetric group can be applied, and
show how the extractable entanglement can be calculated analytically in certain
cases, with a maximum bipartite entanglement in an ensemble of N Bell-state
pairs scaling as log(N) as N goes to infinity . We discuss the apparent
disparity with the asymptotic (N >> 1) recovery of unconstrained entanglement
for other sorts of superselection rules, and show that the disparity disappears
when the correct notion of applying the symmetric group SSR to multiple copies
is used. Next we discuss reference frames in the context of this SSR, showing
the relation to the work of von Korff and Kempe [Phys. Rev. Lett. 93, 260502
(2004)]. The action of a reference frame can be regarded as the analog of
activation in mixed-state entanglement. We also discuss the analog of
distillation: there exist states such that one copy can act as an imperfect
reference frame for another copy. Finally we present an example of a stronger
operational constraint, that operations must be non-collective as well as
symmetric. Even under this stronger constraint we nevertheless show that
Bell-nonlocality (and hence entanglement) can be demonstrated for an ensemble
of N Bell-state pairs no matter how large N is. This last work is a
generalization of that of Mermin [Phys. Rev. D 22, 356 (1980)].Comment: 16 pages, 6 figures. v2 updated version published in Phys Rev
Calibrating Generative Models: The Probabilistic Chomsky-SchĂĽtzenberger Hierarchy
A probabilistic Chomsky–Schützenberger hierarchy of grammars is introduced and studied, with the aim of understanding the expressive power of generative models. We offer characterizations of the distributions definable at each level of the hierarchy, including probabilistic regular, context-free, (linear) indexed, context-sensitive, and unrestricted grammars, each corresponding to familiar probabilistic machine classes. Special attention is given to distributions on (unary notations for) positive integers. Unlike in the classical case where the "semi-linear" languages all collapse into the regular languages, using analytic tools adapted from the classical setting we show there is no collapse in the probabilistic hierarchy: more distributions become definable at each level. We also address related issues such as closure under probabilistic conditioning
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Over the past years, there has been a resurgence of Datalog-based systems in
the database community as well as in industry. In this context, it has been
recognized that to handle the complex knowl\-edge-based scenarios encountered
today, such as reasoning over large knowledge graphs, Datalog has to be
extended with features such as existential quantification. Yet, Datalog-based
reasoning in the presence of existential quantification is in general
undecidable. Many efforts have been made to define decidable fragments. Warded
Datalog+/- is a very promising one, as it captures PTIME complexity while
allowing ontological reasoning. Yet so far, no implementation of Warded
Datalog+/- was available. In this paper we present the Vadalog system, a
Datalog-based system for performing complex logic reasoning tasks, such as
those required in advanced knowledge graphs. The Vadalog system is Oxford's
contribution to the VADA research programme, a joint effort of the universities
of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the
main contribution of this paper, we illustrate the first implementation of
Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive
termination control strategy. We also provide a comprehensive experimental
evaluation.Comment: Extended version of VLDB paper
<https://doi.org/10.14778/3213880.3213888
Games for Active XML Revisited
The paper studies the rewriting mechanisms for intensional documents in the
Active XML framework, abstracted in the form of active context-free games. The
safe rewriting problem studied in this paper is to decide whether the first
player, Juliet, has a winning strategy for a given game and (nested) word; this
corresponds to a successful rewriting strategy for a given intensional
document. The paper examines several extensions to active context-free games.
The primary extension allows more expressive schemas (namely XML schemas and
regular nested word languages) for both target and replacement languages and
has the effect that games are played on nested words instead of (flat) words as
in previous studies. Other extensions consider validation of input parameters
of web services, and an alternative semantics based on insertion of service
call results.
In general, the complexity of the safe rewriting problem is highly
intractable (doubly exponential time), but the paper identifies interesting
tractable cases.Comment: To be published in ICDT 201
- …