277,939 research outputs found
Formal Properties of XML Grammars and Languages
XML documents are described by a document type definition (DTD). An
XML-grammar is a formal grammar that captures the syntactic features of a DTD.
We investigate properties of this family of grammars. We show that every
XML-language basically has a unique XML-grammar. We give two characterizations
of languages generated by XML-grammars, one is set-theoretic, the other is by a
kind of saturation property. We investigate decidability problems and prove
that some properties that are undecidable for general context-free languages
become decidable for XML-languages. We also characterize those XML-grammars
that generate regular XML-languages.Comment: 24 page
XML Matchers: approaches and challenges
Schema Matching, i.e. the process of discovering semantic correspondences
between concepts adopted in different data source schemas, has been a key topic
in Database and Artificial Intelligence research areas for many years. In the
past, it was largely investigated especially for classical database models
(e.g., E/R schemas, relational databases, etc.). However, in the latest years,
the widespread adoption of XML in the most disparate application fields pushed
a growing number of researchers to design XML-specific Schema Matching
approaches, called XML Matchers, aiming at finding semantic matchings between
concepts defined in DTDs and XSDs. XML Matchers do not just take well-known
techniques originally designed for other data models and apply them on
DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical
structure of a DTD/XSD) to improve the performance of the Schema Matching
process. The design of XML Matchers is currently a well-established research
area. The main goal of this paper is to provide a detailed description and
classification of XML Matchers. We first describe to what extent the
specificities of DTDs/XSDs impact on the Schema Matching task. Then we
introduce a template, called XML Matcher Template, that describes the main
components of an XML Matcher, their role and behavior. We illustrate how each
of these components has been implemented in some popular XML Matchers. We
consider our XML Matcher Template as the baseline for objectively comparing
approaches that, at first glance, might appear as unrelated. The introduction
of this template can be useful in the design of future XML Matchers. Finally,
we analyze commercial tools implementing XML Matchers and introduce two
challenging issues strictly related to this topic, namely XML source clustering
and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure
Functional dependencies for XML : axiomatisation and normal form in the presence of frequencies and identifiers : a thesis presented in partial fulfilment of the requirements for the degree of Master of Sciences in Information Sciences at Massey University, Palmerston North, New Zealand
XML has gained popularity as a markup language for publishing and exchanging data on the web. Nowadays, there are also ongoing interests in using XML for representing and actually storing data. In particular, much effort has been directed towards turning XML into a real data model by improving the semantics that can be expressed about XML documents. Various works have addressed how to define different classes of integrity constraints and the development of a normalisation theory for XML. One area which received little to no attention from the research community up to five years ago is the study of functional dependencies in the context of XML [37]. Since then, there has been increasingly more research investigating functional dependencies in XML. Nevertheless, a comprehensive dependency theory and normalisation theory for XML have yet to emerge. Functional dependencies are an integral part of database theory in the relational data model (RDM). In particular, functional dependencies have been vital in the investigation of how to design "good" relational database schemas which avoid or minimise problems relating to data redundancy and data inconsistency. Since the same problems can be shown to exist in poorly designed XML schemas
1
, there is a need to investigate how these problems can be eliminated in the context of XML. We believe that the study of an analogy to relational functional dependencies in the context of XML is equally significant towards designing "good" XML schemas.
[FROM INTRODUCTION
XML data integrity based on concatenated hash function
Data integrity is the fundamental for data authentication. A major problem for XML data authentication is that signed XML data can be copied to another document but still keep signature valid. This is caused by XML data integrity protecting. Through investigation, the paper discovered that besides data content integrity, XML data integrity should also protect element location information, and context referential integrity under fine-grained security situation. The aim of this paper is to propose a model for XML data integrity considering XML data features. The paper presents an XML data integrity model named as CSR (content integrity, structure integrity, context referential integrity) based on a concatenated hash function. XML data content integrity is ensured using an iterative hash process, structure integrity is protected by hashing an absolute path string from root node, and context referential integrity is ensured by protecting context-related elements. Presented XML data integrity model can satisfy integrity requirements under situation of fine-grained security, and compatible with XML signature. Through evaluation, the integrity model presented has a higher efficiency on digest value-generation than the Merkle hash tree-based integrity model for XML data
Building XML data warehouse based on frequent patterns in user queries
[Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
- …