873 research outputs found
A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe
XCSL: XML constraint specification language
After being able to mark-up text and validate its structure according to a document type
specification, we may start thinking it would be natural to be able to validate some nonstructural
issues in the documents. This paper is to formally discuss semantic-related aspects.
In that context, we introduce a domain specific language developed for such a purpose: XCSL.
XCSL is not just a language, it is also a processing model. Furthermore, we discuss the general philosophy underlying the proposed approach, presenting the architecture of our semantic validation system, and we detail the respective processor. To illustrate the use of XCSL language and the subsequent processing, we present two case-studies. Nowadays, we can find some other languages to restrict XML documents to those semantically valid - namely Schematron and XML-Schema. So, before concluding the paper, we compare XCSL to those approaches
Multi-paradigm frameworks for scalable intrusion detection
Research in network security and intrusion detection systems (IDSs) has typically focused on small or artificial data sets. Tools are developed that work well on these data sets but have trouble meeting the demands of real-world, large-scale network environments. In addressing this problem, improvements must be made to the foundations of intrusion detection systems, including data management, IDS accuracy and alert volume;We address data management of network security and intrusion detection information by presenting a database mediator system that provides single query access via a domain specific query language. Results are returned in the form of XML using web services, allowing analysts to access information from remote networks in a uniform manner. The system also provides scalable data capture of log data for multi-terabyte datasets;Next, we address IDS alert accuracy by building an agent-based framework that utilizes web services to make the system easy to deploy and capable of spanning network boundaries. Agents in the framework process IDS alerts managed by a central alert broker. The broker can define processing hierarchies by assigning dependencies on agents to achieve scalability. The framework can also be used for the task of event correlation, or gathering information relevant to an IDS alert;Lastly, we address alert volume by presenting an approach to alert correlation that is IDS independent. Using correlated events gathered in our agent framework, we build a feature vector for each IDS alert representing the network traffic profile of the internal host at the time of the alert. This feature vector is used as a statistical fingerprint in a clustering algorithm that groups related alerts. We analyze our results with a combination of domain expert evaluation and feature selection
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
ebXML: Global Standard for Electronic Business
Business-to-business integration is transforming the market and has already begun to increase the efficiency of those companies involved. EDI (Electronic Document Interchange) became very popular during 1970’s; Today EDI transactions total about $750 billion year. EDI is being used by 90% of Fortune 1000 companies. It has indeed become a dominant technology for the largest companies, on the other hand it has been adopted by less than 5% of small and medium sized companies in general and, of these, many use EDI only because their larger customers require it. The reason behind is that EDI is a difficult, complex technology to implement usually comes with high transactional cost. Hence it is suitable for large companies with large volume of transactions. EDI uses fixed, rigid and compressed data format that is difficult to decipher and debug. The data exchange in EDI happens in proprietary VAN (value added network) which is an expensive solution.
EbXML (Electronic Business XML) envisioned creating a single global electronic marketplace where enterprises of any size and in any geographic location can meet and conduct business with each other through exchange of xml based messages. The XML (the Extensible Markup Language) has rapidly imposed itself as a popular format for exchange of information on the web. The very nature of XML is that it is a structured document format, in that it represents not only the information to be exchanged, but the metadata encapsulating its meaning. XML technology has potential to solve the existing problems in current EDI systems. Using ebXML, companies have a standard method to exchange business messages, conduct trading relationships, communicate data in common terms and define and register business processes. EbXML is designed to provide a simple way for companies to find one another and conduct business over the Web, allowing those with different platforms to speak a common language. EbXML targets to provide low cost solutions for small and medium enterprises as well as complex solution for large enterprises. This project attempts to implement a prototype of ebXML messaging service as per ebXML specification to obtain the insight look of feasibility and suitability of XML solution for EDI
Query Induction with Schema-Guided Pruning Strategies
International audienceInference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schema-guided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction
A first approach to combining ontologies and defeasible argumentation for the semantic web
The Semantic Web is a project intended to create a universal medium for information exchange by giving semantics to the content of documents on the Web through the use of ontology definitions.
Problems for modelling common-sense reasoning (such as reasoning with uncertainty or with incomplete and potentially inconsistent information) are also present when defining ontologies.
In recent years, defeasible argumentation has succeeded as an approach to formalize such common-sense reasoning. Agents operating in multi-agent systems in the context of the Semantic Web need to interact with each other in order to achieve the goals stated by their users. In this paper we propose a XML-based language named XDeLP for ontology interchange among agents in the web.Eje: VI Workshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
- …