1,885 research outputs found
Materialized View Selection in XML Databases
Materialized views, a rdbms silver bullet, demonstrate its
efficacy in many applications, especially as a data warehousing/decison support system tool. The pivot of playing materialized views efficiently is view selection. Though studied for over thirty years in rdbms, the
selection is hard to make in the context of xml databases, where both the semi-structured data and the expressiveness of xml query languages add challenges to the view selection problem. We start our discussion on producing minimal xml views (in terms of size) as candidates for a given workload (a query set). To facilitate intuitionistic view selection, we present a view graph (called vcube) to structurally maintain all generated views. By basing our selection on vcube for materialization, we propose two view selection strategies, targeting at space-optimized and space-time tradeoff, respectively. We built our implementation on
top of Berkeley DB XML, demonstrating that significant performance improvement could be obtained using our proposed approaches
XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme
Query evaluation in an XML database requires reconstructing XML subtrees
rooted at nodes found by an XML query. Since XML subtree reconstruction can be
expensive, one approach to improve query response time is to use reconstruction
views - materialized XML subtrees of an XML document, whose nodes are
frequently accessed by XML queries. For this approach to be efficient, the
principal requirement is a framework for view selection. In this work, we are
the first to formalize and study the problem of XML reconstruction view
selection. The input is a tree , in which every node has a size
and profit , and the size limitation . The target is to find a subset
of subtrees rooted at nodes respectively such that
, and is maximal.
Furthermore, there is no overlap between any two subtrees selected in the
solution. We prove that this problem is NP-hard and present a fully
polynomial-time approximation scheme (FPTAS) as a solution
XML for Domain Viewpoints
Within research institutions like CERN (European Organization for Nuclear
Research) there are often disparate databases (different in format, type and
structure) that users need to access in a domain-specific manner. Users may
want to access a simple unit of information without having to understand detail
of the underlying schema or they may want to access the same information from
several different sources. It is neither desirable nor feasible to require
users to have knowledge of these schemas. Instead it would be advantageous if a
user could query these sources using his or her own domain models and
abstractions of the data. This paper describes the basis of an XML (eXtended
Markup Language) framework that provides this functionality and is currently
being developed at CERN. The goal of the first prototype was to explore the
possibilities of XML for data integration and model management. It shows how
XML can be used to integrate data sources. The framework is not only applicable
to CERN data sources but other environments too.Comment: 9 pages, 6 figures, conference report from SCI'2001 Multiconference
on Systemics & Informatics, Florid
Knowledge and Metadata Integration for Warehousing Complex Data
With the ever-growing availability of so-called complex data, especially on
the Web, decision-support systems such as data warehouses must store and
process data that are not only numerical or symbolic. Warehousing and analyzing
such data requires the joint exploitation of metadata and domain-related
knowledge, which must thereby be integrated. In this paper, we survey the types
of knowledge and metadata that are needed for managing complex data, discuss
the issue of knowledge and metadata integration, and propose a CWM-compliant
integration solution that we incorporate into an XML complex data warehousing
framework we previously designed.Comment: 6th International Conference on Information Systems Technology and
its Applications (ISTA 07), Kharkiv : Ukraine (2007
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
- …