873 research outputs found

    A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

    Get PDF
    The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

    XCSL: XML constraint specification language

    Get PDF
    After being able to mark-up text and validate its structure according to a document type specification, we may start thinking it would be natural to be able to validate some nonstructural issues in the documents. This paper is to formally discuss semantic-related aspects. In that context, we introduce a domain specific language developed for such a purpose: XCSL. XCSL is not just a language, it is also a processing model. Furthermore, we discuss the general philosophy underlying the proposed approach, presenting the architecture of our semantic validation system, and we detail the respective processor. To illustrate the use of XCSL language and the subsequent processing, we present two case-studies. Nowadays, we can find some other languages to restrict XML documents to those semantically valid - namely Schematron and XML-Schema. So, before concluding the paper, we compare XCSL to those approaches

    Multi-paradigm frameworks for scalable intrusion detection

    Get PDF
    Research in network security and intrusion detection systems (IDSs) has typically focused on small or artificial data sets. Tools are developed that work well on these data sets but have trouble meeting the demands of real-world, large-scale network environments. In addressing this problem, improvements must be made to the foundations of intrusion detection systems, including data management, IDS accuracy and alert volume;We address data management of network security and intrusion detection information by presenting a database mediator system that provides single query access via a domain specific query language. Results are returned in the form of XML using web services, allowing analysts to access information from remote networks in a uniform manner. The system also provides scalable data capture of log data for multi-terabyte datasets;Next, we address IDS alert accuracy by building an agent-based framework that utilizes web services to make the system easy to deploy and capable of spanning network boundaries. Agents in the framework process IDS alerts managed by a central alert broker. The broker can define processing hierarchies by assigning dependencies on agents to achieve scalability. The framework can also be used for the task of event correlation, or gathering information relevant to an IDS alert;Lastly, we address alert volume by presenting an approach to alert correlation that is IDS independent. Using correlated events gathered in our agent framework, we build a feature vector for each IDS alert representing the network traffic profile of the internal host at the time of the alert. This feature vector is used as a statistical fingerprint in a clustering algorithm that groups related alerts. We analyze our results with a combination of domain expert evaluation and feature selection

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    ebXML: Global Standard for Electronic Business

    Get PDF
    Business-to-business integration is transforming the market and has already begun to increase the efficiency of those companies involved. EDI (Electronic Document Interchange) became very popular during 1970’s; Today EDI transactions total about $750 billion year. EDI is being used by 90% of Fortune 1000 companies. It has indeed become a dominant technology for the largest companies, on the other hand it has been adopted by less than 5% of small and medium sized companies in general and, of these, many use EDI only because their larger customers require it. The reason behind is that EDI is a difficult, complex technology to implement usually comes with high transactional cost. Hence it is suitable for large companies with large volume of transactions. EDI uses fixed, rigid and compressed data format that is difficult to decipher and debug. The data exchange in EDI happens in proprietary VAN (value added network) which is an expensive solution. EbXML (Electronic Business XML) envisioned creating a single global electronic marketplace where enterprises of any size and in any geographic location can meet and conduct business with each other through exchange of xml based messages. The XML (the Extensible Markup Language) has rapidly imposed itself as a popular format for exchange of information on the web. The very nature of XML is that it is a structured document format, in that it represents not only the information to be exchanged, but the metadata encapsulating its meaning. XML technology has potential to solve the existing problems in current EDI systems. Using ebXML, companies have a standard method to exchange business messages, conduct trading relationships, communicate data in common terms and define and register business processes. EbXML is designed to provide a simple way for companies to find one another and conduct business over the Web, allowing those with different platforms to speak a common language. EbXML targets to provide low cost solutions for small and medium enterprises as well as complex solution for large enterprises. This project attempts to implement a prototype of ebXML messaging service as per ebXML specification to obtain the insight look of feasibility and suitability of XML solution for EDI

    Query Induction with Schema-Guided Pruning Strategies

    Get PDF
    International audienceInference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schema-guided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction

    A first approach to combining ontologies and defeasible argumentation for the semantic web

    Get PDF
    The Semantic Web is a project intended to create a universal medium for information exchange by giving semantics to the content of documents on the Web through the use of ontology definitions. Problems for modelling common-sense reasoning (such as reasoning with uncertainty or with incomplete and potentially inconsistent information) are also present when defining ontologies. In recent years, defeasible argumentation has succeeded as an approach to formalize such common-sense reasoning. Agents operating in multi-agent systems in the context of the Semantic Web need to interact with each other in order to achieve the goals stated by their users. In this paper we propose a XML-based language named XDeLP for ontology interchange among agents in the web.Eje: VI Workshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
    • …
    corecore