4,919 research outputs found

    HUDDL for description and archive of hydrographic binary data

    Get PDF
    Many of the attempts to introduce a universal hydrographic binary data format have failed or have been only partially successful. In essence, this is because such formats either have to simplify the data to such an extent that they only support the lowest common subset of all the formats covered, or they attempt to be a superset of all formats and quickly become cumbersome. Neither choice works well in practice. This paper presents a different approach: a standardized description of (past, present, and future) data formats using the Hydrographic Universal Data Description Language (HUDDL), a descriptive language implemented using the Extensible Markup Language (XML). That is, XML is used to provide a structural and physical description of a data format, rather than the content of a particular file. Done correctly, this opens the possibility of automatically generating both multi-language data parsers and documentation for format specification based on their HUDDL descriptions, as well as providing easy version control of them. This solution also provides a powerful approach for archiving a structural description of data along with the data, so that binary data will be easy to access in the future. Intending to provide a relatively low-effort solution to index the wide range of existing formats, we suggest the creation of a catalogue of format descriptions, each of them capturing the logical and physical specifications for a given data format (with its subsequent upgrades). A C/C++ parser code generator is used as an example prototype of one of the possible advantages of the adoption of such a hydrographic data format catalogue

    Consistently Updating XML Documents Using Incremental checks With XQueries

    Get PDF
    When updating a valid XML Data or Schema, an efficient yet light-weight mechanism is needed to determine if the update would invalidate the document. Towards this goal, we have developed a framework called SAXE. First, we analyzed the constraints expressed in XML schema specifications to establish constraint rules that must be observed when a schema or an XML data conforming to a given XML Schema is altered. We then classify the rules based on their relevancy for a given update case. That is, we show the minimal set of rules that must be checked to guarantee the safety for each update primitive. Next, we illustrate that this set of incremental constraint checks can be specified using generic XQuery expressions composed of three type of components. Safe updates for the XML data have the following components: (1) XML schema meta-queries to retrieve any con-straint knowledge potentially relevant to the given update from the schema or XMl data being altered, (2) retrieval of specific characteristics from the to-be-modified XML, and (3) lastly an analysis of information collected about the XML schema and the affected XML document to determine validity of the update. For the safe schema alteration, the components are: (1) XML schema meta-queries to retrieve relevant information from the schema (2)analysis and usage of retrieved information to update the schema, and lastly to (3) propagate the changes to the XML data when necessary. As a proof of concept, we have established a library of these generic XQuery constraint checks for the type-related XML constraints. The key idea of SAXE is to rewrite each XQuery update into a safe XML Query by extending it with appropriate constraint check subqueries. This en-hanced XML update query can then safely be executed using any existing XQuery engine that supports updates - thus turning any update engine automatically into an incremen-tal constraint-check engine. In order to verify the feasibility of our approach, we have implemented a prototype system SAXE that generates safe XQuery updates. Our experimental evaluation assesses the overhead of rewriting as well as the relative performance of our loosely-coupled incremental constraint check approach against the more traditional first-change-document and then revalidate-it approach

    IMAX: incremental maintenance of schema-based XML statistics

    Get PDF
    Journal ArticleCurrent approaches for estimating the cardinality of XML queries are applicable to a static scenario wherein the underlying XML data does not change subsequent to the collection of statistics on the repository. However, in practice, many XML-based applications are dynamic and involve frequent updates to the data. In this paper, we investigate efficient strategies for incrementally maintaining statistical summaries as and when updates are applied to the data. Specifically, we propose algorithms that handle both the addition of new documents as well as random insertions in the existing document trees. We also show, through a detailed performance evaluation, that our incremental techniques are significantly faster than the naive recomputation approach; and that estimation accuracy can be maintained even with a fixed memory budget

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Huddl: the Hydrographic Universal Data Description Language

    Get PDF
    Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access

    Efficient storage of XML data

    Full text link
    We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim

    XRound : A reversible template language and its application in model-based security analysis

    Get PDF
    Successful analysis of the models used in Model-Driven Development requires the ability to synthesise the results of analysis and automatically integrate these results with the models themselves. This paper presents a reversible template language called XRound which supports round-trip transformations between models and the logic used to encode system properties. A template processor that supports the language is described, and the use of the template language is illustrated by its application in an analysis workbench, designed to support analysis of security properties of UML and MOF-based models. As a result of using reversible templates, it is possible to seamlessly and automatically integrate the results of a security analysis with a model. (C) 2008 Elsevier B.V. All rights reserved
    corecore