4,447 research outputs found
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Hypothesis Testing Interpretations and Renyi Differential Privacy
Differential privacy is a de facto standard in data privacy, with
applications in the public and private sectors. A way to explain differential
privacy, which is particularly appealing to statistician and social scientists
is by means of its statistical hypothesis testing interpretation. Informally,
one cannot effectively test whether a specific individual has contributed her
data by observing the output of a private mechanism---any test cannot have both
high significance and high power.
In this paper, we identify some conditions under which a privacy definition
given in terms of a statistical divergence satisfies a similar interpretation.
These conditions are useful to analyze the distinguishability power of
divergences and we use them to study the hypothesis testing interpretation of
some relaxations of differential privacy based on Renyi divergence. This
analysis also results in an improved conversion rule between these definitions
and differential privacy
Declarative process modeling in BPMN
Traditional business process modeling notations, including the standard Business Process Model and Notation (BPMN), rely on an imperative paradigm wherein the process model captures all allowed activity flows. In other words, every flow that is not specified is implicitly disallowed. In the past decade, several researchers have exposed the limitations of this paradigm in the context of business processes with high variability. As an alternative, declarative process modeling notations have been proposed (e.g., Declare). These notations allow modelers to capture constraints on the allowed activity flows, meaning that all flows are allowed provided that they do not violate the specified constraints. Recently, it has been recognized that the boundary between imperative and declarative process modeling is not crisp. Instead, mixtures of declarative and imperative process modeling styles are sometimes preferable, leading to proposals for hybrid process modeling notations. These developments raise the question of whether completely new notations are needed to support hybrid process modeling. This paper answers this question negatively. The paper presents a conservative extension of BPMN for declarative process modeling, namely BPMN-D, and shows that Declare models can be transformed into readable BPMN-D models. © Springer International Publishing Switzerland 2015
Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching
We develop a formal perspective on how regular expression matching works in
Java, a popular representative of the category of regex-directed matching
engines. In particular, we define an automata model which captures all the
aspects needed to study such matching engines in a formal way. Based on this,
we propose two types of static analysis, which take a regular expression and
tell whether there exists a family of strings which makes Java-style matching
run in exponential time.Comment: In Proceedings AFL 2014, arXiv:1405.527
- …