42,003 research outputs found
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
XOR-Sampling for Network Design with Correlated Stochastic Events
Many network optimization problems can be formulated as stochastic network
design problems in which edges are present or absent stochastically.
Furthermore, protective actions can guarantee that edges will remain present.
We consider the problem of finding the optimal protection strategy under a
budget limit in order to maximize some connectivity measurements of the
network. Previous approaches rely on the assumption that edges are independent.
In this paper, we consider a more realistic setting where multiple edges are
not independent due to natural disasters or regional events that make the
states of multiple edges stochastically correlated. We use Markov Random Fields
to model the correlation and define a new stochastic network design framework.
We provide a novel algorithm based on Sample Average Approximation (SAA)
coupled with a Gibbs or XOR sampler. The experimental results on real road
network data show that the policies produced by SAA with the XOR sampler have
higher quality and lower variance compared to SAA with Gibbs sampler.Comment: In Proceedings of the Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17). The first two authors contribute equall
A Random Structure for Optimum Cache Size Distributed hash table (DHT) Peer-to-Peer design
We propose a new and easily-realizable distributed hash table (DHT)
peer-to-peer structure, incorporating a random caching strategy that allows for
{\em polylogarithmic search time} while having only a {\em constant cache}
size. We also show that a very large class of deterministic caching strategies,
which covers almost all previously proposed DHT systems, can not achieve
polylog search time with constant cache size. In general, the new scheme is the
first known DHT structure with the following highly-desired properties: (a)
Random caching strategy with constant cache size; (b) Average search time of
; (c) Guaranteed search time of ; (d) Truly local
cache dynamics with constant overhead for node deletions and additions; (e)
Self-organization from any initial network state towards the desired structure;
and (f) Allows a seamless means for various trade-offs, e.g., search speed or
anonymity at the expense of larger cache size.Comment: 13 pages, 2 figures, preprint versio
Discriminating word senses with tourist walks in complex networks
Patterns of topological arrangement are widely used for both animal and human
brains in the learning process. Nevertheless, automatic learning techniques
frequently overlook these patterns. In this paper, we apply a learning
technique based on the structural organization of the data in the attribute
space to the problem of discriminating the senses of 10 polysemous words. Using
two types of characterization of meanings, namely semantical and topological
approaches, we have observed significative accuracy rates in identifying the
suitable meanings in both techniques. Most importantly, we have found that the
characterization based on the deterministic tourist walk improves the
disambiguation process when one compares with the discrimination achieved with
traditional complex networks measurements such as assortativity and clustering
coefficient. To our knowledge, this is the first time that such deterministic
walk has been applied to such a kind of problem. Therefore, our finding
suggests that the tourist walk characterization may be useful in other related
applications
- …