Search CORE

4 research outputs found

Recommended from our members

An effective data placement strategy for XML documents

Author: Lü K
Zhu Y
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

As XML is increasingly being used in Web applications, new technologies need to be investigated for processing XML documents with high performance. Parallelism is a promising solution for structured document processing and data placement is a major factor for system performance improvement in parallel processing. This paper describes an effective XML document data placement strategy. The new strategy is based on a multilevel graph partitioning algorithm with the consideration of the unique features of XML documents and query distributions. A new algorithm, which is based on XML query schemas to derive the weighted graph from the labelled directed graph presentation of XML documents, is also proposed. Performance analysis on the algorithm presented in the paper shows that the new data placement strategy exhibits low workload skew and a high degree of parallelism

Brunel University Research Archive

Querying Semistructured Data Based On Schema Matching

Author: André Bergholz
Publication venue
Publication date: 01/01/2000
Field of study

Most of today's data is still stored in les rather than in databases. This fact has become even more evident with the growth of the World Wide Web in the 1990s. Because of that observation, the research area of semistructured data has evolved. Semistructured data is typically stored in documents and has an irregular, partial, and implicit structure. The thesis presents a new framework for querying semistructured data. Traditional database management requires design and ensures declarativity. The possibilities to design are limited in the field of semistructured data, thus, a more flexible approach is needed. We argue that semistructured data should be represented by a set of partial schemata rather than by one complete schema. Because of irregularities of the data, a complete schema would be very large and not representative. Instead, partial schemata can serve as good representations of parts of the data. While finding a complete schema turns out to be difficult, a database designer may be able to provide partial schemata for the database. Also, partial schemata can be extracted from user queries if the query language is designed appropriately. We suggest to split the notion of query into a "What"- and a "How"-part. Partial schemata represent the "What"-part. They cover semantically richer concepts than database schemata traditionally do. Among these concepts are predicates, variable definitions, and path descriptions. Schemata can be used for query optimization, but they also give users hints on the content of the database. Finding the occurrences (matches) of such a schema forms the most important part of query execution. All queries of our approach, such as the focus query or the transformation query, are based on this matching. Query execution can be optimized using kn..

CiteSeerX

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Querying Semistructured Data Based on Schema Matching

Author: A. Zuendorf
G. Kondrak
J. McHugh
V. Kumar
W. T. Trotter
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Querying Semistructured Data based on Schema Matching

Author: André Bergholz
Johann Christoph Freytag
Publication venue
Publication date
Field of study

Traditional database management requires design and ensures declarativity. In the context of semistructured data a more flexible approach is appropriate due to missing schema information. In this paper we present a query language based on schema matching. Intuitively, a query is a pair consisting of what we want and how we want it. We propose that the former can be achieved by matching a (partial) schema and the latter by specifying additional operations. We describe in some detail our notion of schema which covers various concepts such as predicates, variables and paths. We outline the optimization potential that this modular approach offers and discuss how we use constraints for query processing. 1 Introduction Traditional database management requires design and ensures declarativity. Semistructured data, "data that is neither raw data nor strictly typed", lacks a fixed and rigid schema [Abi97]. Often their structure is irregular and implicit. Examples for semistructured data includ..

CiteSeerX