Query processing using views in semistructured databases

Abstract

Since its introduction, XML, the eXtensible Markup Language, has quickly emerged as the universal format for publishing and exchanging data in the World Wide Web. As a result, data sources, including object-relational databases, are now faced with a new class of users: clients and customers who would like to deal directly with XML data rather than being forced to deal with the data source particular schema and query languages. XML is also rapidly becoming popular for representing web data as it brings a finely granulated structure to the web information and exposes the semantics of the web content. In all these web applications including electronic commerce and intelligent agents, view mechanisms are recognized as critical and are being widely employed to represent, users' specific interests. Rewriting the user queries using views is a powerful technique in the above described applications, which can be categorized as data integration, data warehousing and query optimization. In this study we identify some difficulties with currently known methods for using rewritings in XML-like "semistructured" databases. We study the problem in two realistic scenarios. The first one is related to information integration systems such as the Information Manifold, in which the data sources are modelled as sound views over a global schema. The second scenario, is query optimization using cached views. In this setting we propose two kinds of algebraic rewritings that focus on extracting as much information as possible from the views for the purpose of optimizing regular path queries, which are the building block of all the query languages for semistructured data

    Similar works