91,103 research outputs found

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Alternative representations for visual constrainst specification in the layered view model

    Get PDF
    Extensible Markup Language (XML), with its rich set of semantics and constraints, is becoming the dominant standard for storing, describing and interchanging data among various Enterprises Information Systems (EIS) and databases. With the increased reliance on such semi-structured data and schemas, there exists a requirement to model, design, and constrain semi-structured data and the associated semantics at a higher level of abstraction than at the instance or data level. But most semi-structured schema languages lack the ability to provide higher levels of abstraction, such as visual constraints, that are easily understood by humans. Conversely, though Object-Oriented (OO) conceptual models offers the power in describing and modelling real-world data semantics, constraints and their inter-relationships in a form that is precise and comprehensible to users, they provide insufficient modelling constructs for utilizing XML schema like data descriptions and constraints. Therefore, it is interesting to investigate conceptual and schema formalisms as a means of providing higher level semantics in the context of XML-related data engineering. In this paper, we present a visual constraint specification model for an XML layered view model. First we briefly outline the view model and then provide a detailed discussion on modelling issues related to view constraint specification using two OO modelling languages, namely OMG's UML/OCL and XML Semantics (XSemantic) nets. To demonstrate our concepts, we also provide an illustrative case study example based on a real-world application

    A layered view model for XML with conceptual and logical extensions, and its applications

    Full text link
    University of Technology, Sydney. Faculty of Information Technology.EXtensible Markup Language (XML) is becoming the dominant standard for storing, describing and interchanging data among various Enterprises Information Systems (EIS), web repositories and databases. With this increasing reliance on such self-describing, schema-based, semi-structured data language XML, there exists a need to model, design, and manipulate XML and associated semantics at a higher level of abstraction than at the instance level. However, existing OO conceptual modelling languages provide insufficient modelling constructs for utilizing XML structures, descriptions and constraints, and XML and associated schema languages lack the ability to provide higher levels of abstraction, such as conceptual models that are easily understood by humans. To this end, it is interesting to investigate conceptual and schema formalisms as a means of providing higher level semantics in the context of XML-related data modelling. In particular we note that there is a strong need to model views of XML repositories at the conceptual level. This is in contrast to the situation for views for the relational model which are generally defined at the implementation level. In this research, we use XML view and introduce the Layered View Model (LVM, for short), a declarative conceptual framework for specifying and defining views at a higher level of abstraction. The views in the LVM are specified using explicit conceptual, logical and instance level semantics and provide declarative transformation between these levels of abstraction. For such a task, an elaborated and enhanced OO based modelling and transformation methodology is employed. The LVM framework leads to a number of interesting problems that are studied in this research. First we address the issue of conceptualizing the notion of views: the clear separation of conceptual concerns from the implementation and data language concerns. Here, the LVM views are considered as first-class citizens of the conceptual model. Second we provide formal semantics and definitions to enforce representation, specification and definition of such views at the highest level of abstraction, the conceptual level. Third we address the issue of modelling and transformation of LVM views to the required level of abstraction, namely to the schema and instance levels. Finally, we apply LVM to real-world data modelling scenarios to develop other architectural frameworks in the domains such as dimensional XML data modelling, ontology views in the Semantic Web paradigm and modelling user-centred websites and web portals

    Modeling views in the layered view model for XML using UML

    Get PDF
    In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction

    Perspectives for Electronic Books in the World Wide Web Age

    Get PDF
    While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e-books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web-based e-books compete with paper prints; and that adaptive features that can be implemented by client-side computing are relevant for e-books, while more complex forms of adaptation requiring server-side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style-sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client-side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e-books as well as their implementation in the WWW context are described

    Compressed materialised views of semi-structured data

    Get PDF
    Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures

    User Feedback in Probabilistic XML

    Get PDF
    Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We have proposed a probabilistic XML approach that allows data integration without user involvement by storing semantic uncertainty and conflicts in the integrated XML data. As a\ud consequence, the integrated information source represents\ud all possible appearances of objects in the real world, the\ud so-called possible worlds.\ud \ud In this paper, we show how user feedback on query results\ud can resolve semantic uncertainty and conflicts in the\ud integrated data. Hence, user involvement is effectively postponed to query time, when a user is already interacting actively with the system. The technique relates positive and\ud negative statements on query answers to the possible worlds\ud of the information source thereby either reinforcing, penalizing, or eliminating possible worlds. We show that after repeated user feedback, an integrated information source better resembles the real world and may converge towards a non-probabilistic information source

    Query management in a sensor environment

    Get PDF
    Traditional sensor network deployments consisted of fixed infrastructures and were relatively small in size. More and more, we see the deployment of ad-hoc sensor networks with heterogeneous devices on a larger scale, posing new challenges for device management and query processing. In this paper, we present our design and prototype implementation of XSense, an architecture supporting metadata and query services for an underlying large scale dynamic P2P sensor network. We cluster sensor devices into manageable groupings to optimise the query process and automatically locate appropriate clusters based on keyword abstraction from queries. We present experimental analysis to show the benefits of our approach and demonstrate improved query performance and scalability