6 research outputs found

    XML stream processing using tree-edit distance embeddings

    No full text
    We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing a (worst-case) upper bound of O(log 2 n log ∗ n)onthe distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worstcase distortion bounds. To the best of our knowledge, these are the first algorithmic results on lowdistortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Query processing; G.2.1 [Discrete Mathematics]: Combinatorics—Combinatorial algorithm

    XML stream processing using tree-edit distance embeddings

    No full text
    Summarization: We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing a (worst-case) upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst-case distortion bounds. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.Presented on: ACM Transactions on Database System

    Online Analysis of Dynamic Streaming Data

    Get PDF
    Die Arbeit zum Thema "Online Analysis of Dynamic Streaming Data" beschäftigt sich mit der Distanzmessung dynamischer, semistrukturierter Daten in kontinuierlichen Datenströmen um Analysen auf diesen Datenstrukturen bereits zur Laufzeit zu ermöglichen. Hierzu wird eine Formalisierung zur Distanzberechnung für statische und dynamische Bäume eingeführt und durch eine explizite Betrachtung der Dynamik von Attributen einzelner Knoten der Bäume ergänzt. Die Echtzeitanalyse basierend auf der Distanzmessung wird durch ein dichte-basiertes Clustering ergänzt, um eine Anwendung des Clustering, einer Klassifikation, aber auch einer Anomalieerkennung zu demonstrieren. Die Ergebnisse dieser Arbeit basieren auf einer theoretischen Analyse der eingeführten Formalisierung von Distanzmessungen für dynamische Bäume. Diese Analysen werden unterlegt mit empirischen Messungen auf Basis von Monitoring-Daten von Batchjobs aus dem Batchsystem des GridKa Daten- und Rechenzentrums. Die Evaluation der vorgeschlagenen Formalisierung sowie der darauf aufbauenden Echtzeitanalysemethoden zeigen die Effizienz und Skalierbarkeit des Verfahrens. Zudem wird gezeigt, dass die Betrachtung von Attributen und Attribut-Statistiken von besonderer Bedeutung für die Qualität der Ergebnisse von Analysen dynamischer, semistrukturierter Daten ist. Außerdem zeigt die Evaluation, dass die Qualität der Ergebnisse durch eine unabhängige Kombination mehrerer Distanzen weiter verbessert werden kann. Insbesondere wird durch die Ergebnisse dieser Arbeit die Analyse sich über die Zeit verändernder Daten ermöglicht

    A tree grammar-based visual password scheme

    Get PDF
    A thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, August 31, 2015.Visual password schemes can be considered as an alternative to alphanumeric passwords. Studies have shown that alphanumeric passwords can, amongst others, be eavesdropped, shoulder surfed, or guessed, and are susceptible to brute force automated attacks. Visual password schemes use images, in place of alphanumeric characters, for authentication. For example, users of visual password schemes either select images (Cognometric) or points on an image (Locimetric) or attempt to redraw their password image (Drawmetric), in order to gain authentication. Visual passwords are limited by the so-called password space, i.e., by the size of the alphabet from which users can draw to create a password and by susceptibility to stealing of passimages by someone looking over your shoulders, referred to as shoulder surfing in the literature. The use of automatically generated highly similar abstract images defeats shoulder surfing and means that an almost unlimited pool of images is available for use in a visual password scheme, thus also overcoming the issue of limited potential password space. This research investigated visual password schemes. In particular, this study looked at the possibility of using tree picture grammars to generate abstract graphics for use in a visual password scheme. In this work, we also took a look at how humans determine similarity of abstract computer generated images, referred to as perceptual similarity in the literature. We drew on the psychological idea of similarity and matched that as closely as possible with a mathematical measure of image similarity, using Content Based Image Retrieval (CBIR) and tree edit distance measures. To this end, an online similarity survey was conducted with respondents ordering answer images in order of similarity to question images, involving 661 respondents and 50 images. The survey images were also compared with eight, state of the art, computer based similarity measures to determine how closely they model perceptual similarity. Since all the images were generated with tree grammars, the most popular measure of tree similarity, the tree edit distance, was also used to compare the images. Eight different types of tree edit distance measures were used in order to cover the broad range of tree edit distance and tree edit distance approximation methods. All the computer based similarity methods were then correlated with the online similarity survey results, to determine which ones more closely model perceptual similarity. The results were then analysed in the light of some modern psychological theories of perceptual similarity. This work represents a novel approach to the Passfaces type of visual password schemes using dynamically generated pass-images and their highly similar distractors, instead of static pictures stored in an online database. The results of the online survey were then accurately modelled using the most suitable tree edit distance measure, in order to automate the determination of similarity of our generated distractor images. The information gathered from our various experiments was then used in the design of a prototype visual password scheme. The generated images were similar, but not identical, in order to defeat shoulder surfing. This approach overcomes the following problems with this category of visual password schemes: shoulder surfing, bias in image selection, selection of easy to guess pictures and infrastructural limitations like large picture databases, network speed and database security issues. The resulting prototype developed is highly secure, resilient to shoulder surfing and easy for humans to use, and overcomes the aforementioned limitations in this category of visual password schemes
    corecore