4,065 research outputs found
Document Type De�nition (DTD) Metrics
In this paper, we present two complexity metrics for the assessment of schema quality written in Document Type De�finition (DTD) language. Both "Entropy (E) metric: E(DTD)" and "Distinct Structured Element Repetition Scale (DSERS) metric: DSERS(DTD)" are intended to measure the structural complexity of schemas in DTD language. These metrics exploit a directed graph representation of schema document and consider the complexity of schema due to its similar structured elements and the occurrences of these
elements. The empirical and theoretical validations of these metrics prove the robustness of the metrics
Entropy as a Measure of Quality of XML Schema Document
In this paper, a metric for the assessment of the structural complexity of eXtensible Markup Language schema
document is formulated. The present metric ‘Schema Entropy is based on entropy concept and intended to measure the
complexity of the schema documents written in W3C XML Schema Language due to diversity in the structures of its elements. The SE is useful in evaluating the efficiency of the design of Schemas. A good design reduces the maintainability efforts. Therefore, our metric provides valuable information about the reliability and maintainability of systems. In this respect, this
metric is believed to be a valuable contribution for improving the quality of XML-based systems. It is demonstrated with examples and validated empirically through actual test cases
Relational Approach to Logical Query Optimization of XPath
To be able to handle the ever growing volumes of XML documents, effective and efficient data management solutions are needed. Managing XML data in a relational DBMS has great potential. Recently, effective relational storage schemes and index structures have been proposed as well as special-purpose join operators to speed up querying of XML data using XPath/XQuery. In this paper, we address the topic of query plan construction and logical query optimization. The claim of this paper is that standard relational algebra extended with special-purpose join operators suffices for logical query optimization. We focus on the XPath accelerator storage scheme and associated staircase join operators, but the approach can be generalized easily
Projector - a partially typed language for querying XML
We describe Projector, a language that can be used to perform a mixture of typed and untyped computation against data represented in XML. For some problems, notably when the data is unstructured or semistructured, the most desirable programming model is against the tree structure underlying the document. When this tree structure has been used to model regular data structures, then these regular structures themselves are a more desirable programming model. The language Projector, described here in outline, gives both models within a single partially typed algebra and is well suited for hybrid applications, for example when fragments of a known structure are embedded in a document whose overall structure is unknown. Projector is an extension of ECMA-262 (aka JavaScript), and therefore inherits an untyped DOM interface. To this has been added some static typing and a dynamic projection primitive, which can be used to assert the presence of a regular structure modelled within the XML. If this structure does exist, the data is extracted and presented as a typed value within the programming language
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Global Numerical Constraints on Trees
We introduce a logical foundation to reason on tree structures with
constraints on the number of node occurrences. Related formalisms are limited
to express occurrence constraints on particular tree regions, as for instance
the children of a given node. By contrast, the logic introduced in the present
work can concisely express numerical bounds on any region, descendants or
ancestors for instance. We prove that the logic is decidable in single
exponential time even if the numerical constraints are in binary form. We also
illustrate the usage of the logic in the description of numerical constraints
on multi-directional path queries on XML documents. Furthermore, numerical
restrictions on regular languages (XML schemas) can also be concisely described
by the logic. This implies a characterization of decidable counting extensions
of XPath queries and XML schemas. Moreover, as the logic is closed under
negation, it can thus be used as an optimal reasoning framework for testing
emptiness, containment and equivalence
- …