6 research outputs found
XML stream processing using tree-edit distance embeddings
We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing a (worst-case) upper bound of O(log 2 n log ∗ n)onthe distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worstcase distortion bounds. To the best of our knowledge, these are the first algorithmic results on lowdistortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Query processing; G.2.1 [Discrete Mathematics]: Combinatorics—Combinatorial algorithm
XML stream processing using tree-edit distance embeddings
Summarization: We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing a (worst-case) upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst-case distortion bounds. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.Presented on: ACM Transactions on Database System
Online Analysis of Dynamic Streaming Data
Die Arbeit zum Thema "Online Analysis of Dynamic Streaming Data" beschäftigt sich mit der Distanzmessung dynamischer, semistrukturierter Daten in kontinuierlichen Datenströmen um Analysen auf diesen Datenstrukturen bereits zur Laufzeit zu ermöglichen. Hierzu wird eine Formalisierung zur Distanzberechnung für statische und dynamische Bäume eingeführt und durch eine explizite Betrachtung der Dynamik von Attributen einzelner Knoten der Bäume ergänzt. Die Echtzeitanalyse basierend auf der Distanzmessung wird durch ein dichte-basiertes Clustering ergänzt, um eine Anwendung des Clustering, einer Klassifikation, aber auch einer Anomalieerkennung zu demonstrieren.
Die Ergebnisse dieser Arbeit basieren auf einer theoretischen Analyse der eingeführten Formalisierung von Distanzmessungen für dynamische Bäume. Diese Analysen werden unterlegt mit empirischen Messungen auf Basis von Monitoring-Daten von Batchjobs aus dem Batchsystem des GridKa Daten- und Rechenzentrums. Die Evaluation der vorgeschlagenen Formalisierung sowie der darauf aufbauenden Echtzeitanalysemethoden zeigen die Effizienz und Skalierbarkeit des Verfahrens. Zudem wird gezeigt, dass die Betrachtung von Attributen und Attribut-Statistiken von besonderer Bedeutung für die Qualität der Ergebnisse von Analysen dynamischer, semistrukturierter Daten ist. Außerdem zeigt die Evaluation, dass die Qualität der Ergebnisse durch eine unabhängige Kombination mehrerer Distanzen weiter verbessert werden kann. Insbesondere wird durch die Ergebnisse dieser Arbeit die Analyse sich über die Zeit verändernder Daten ermöglicht
A tree grammar-based visual password scheme
A thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, August 31, 2015.Visual password schemes can be considered as an alternative to alphanumeric
passwords. Studies have shown that alphanumeric passwords
can, amongst others, be eavesdropped, shoulder surfed, or
guessed, and are susceptible to brute force automated attacks. Visual
password schemes use images, in place of alphanumeric characters,
for authentication. For example, users of visual password schemes either
select images (Cognometric) or points on an image (Locimetric)
or attempt to redraw their password image (Drawmetric), in order
to gain authentication. Visual passwords are limited by the so-called
password space, i.e., by the size of the alphabet from which users can
draw to create a password and by susceptibility to stealing of passimages
by someone looking over your shoulders, referred to as shoulder
surfing in the literature. The use of automatically generated highly
similar abstract images defeats shoulder surfing and means that an almost
unlimited pool of images is available for use in a visual password
scheme, thus also overcoming the issue of limited potential password
space.
This research investigated visual password schemes. In particular,
this study looked at the possibility of using tree picture grammars to
generate abstract graphics for use in a visual password scheme. In this
work, we also took a look at how humans determine similarity of abstract
computer generated images, referred to as perceptual similarity
in the literature. We drew on the psychological idea of similarity and
matched that as closely as possible with a mathematical measure of
image similarity, using Content Based Image Retrieval (CBIR) and
tree edit distance measures. To this end, an online similarity survey
was conducted with respondents ordering answer images in order
of similarity to question images, involving 661 respondents and 50
images. The survey images were also compared with eight, state of
the art, computer based similarity measures to determine how closely
they model perceptual similarity. Since all the images were generated
with tree grammars, the most popular measure of tree similarity, the
tree edit distance, was also used to compare the images. Eight different
types of tree edit distance measures were used in order to cover
the broad range of tree edit distance and tree edit distance approximation
methods. All the computer based similarity methods were
then correlated with the online similarity survey results, to determine
which ones more closely model perceptual similarity. The results were
then analysed in the light of some modern psychological theories of
perceptual similarity.
This work represents a novel approach to the Passfaces type of visual
password schemes using dynamically generated pass-images and their
highly similar distractors, instead of static pictures stored in an online
database. The results of the online survey were then accurately
modelled using the most suitable tree edit distance measure, in order
to automate the determination of similarity of our generated distractor
images. The information gathered from our various experiments
was then used in the design of a prototype visual password scheme.
The generated images were similar, but not identical, in order to defeat
shoulder surfing. This approach overcomes the following problems
with this category of visual password schemes: shoulder surfing,
bias in image selection, selection of easy to guess pictures and infrastructural
limitations like large picture databases, network speed and
database security issues. The resulting prototype developed is highly
secure, resilient to shoulder surfing and easy for humans to use, and
overcomes the aforementioned limitations in this category of visual
password schemes