Search CORE

15,263 research outputs found

Pattern based processing of XPath queries

Author: Marks Gerard
Roantree Mark
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

As the popularity of areas including document storage and distributed systems continues to grow, the demand for high performance XML databases is increasingly evident. This has led to a number of research eorts aimed at exploiting the maturity of relational database systems in order to in- crease XML query performance. In our approach, we use an index structure based on a metamodel for XML databases combined with relational database technology to facilitate fast access to XML document elements. The query process involves transforming XPath expressions to SQL which can be executed over our optimised query engine. As there are many dierent types of XPath queries, varying processing logic may be applied to boost performance not only to indi- vidual XPath axes, but across multiple axes simultaneously. This paper describes a pattern based approach to XPath query processing, which permits the execution of a group of XPath location steps in parallel

CiteSeerX

VAMANA : A High Performance, Scalable and Cost Driven XPath Engine

Author: Raghavan Venkatesh
Publication venue: Digital WPI
Publication date: 05/05/2004
Field of study

Many applications are migrating or beginning to make use native XML data. We anticipate that queries will emerge that emphasize the structural semantics of XML query languages like XPath and XQuery. This brings a need for an efficient query engine and database management system tailored for XML data similar to traditional relational engines. While mapping large XML documents into relational database systems while possible, poses difficulty in mapping XML queries to the less powerful relational query language SQL and creates a data model mismatch between relational tables and semi-structured XML data. Hence native solutions to efficiently store and query XML data are being developed recently. However, most of these systems thus far fail to demonstrate scalability with large document sizes, to provide robust support for the XPath query language nor to adequately address costing with respect to query optimization. In this thesis, we propose a novel cost-driven XPath engine to support the scalable evaluation of ad-hoc XPath expressions called VAMANA. VAMANA makes use of an efficient XML repository for storing and indexing large XML documents called the Multi-Axis Storage Structure (MASS) developed at WPI. VAMANA extensively uses indexes for query evaluation by considering index-only plans. To the best of our knowledge, it is the only XML query engine that supports an index plan approach for large XML documents. Our index-oriented query plans allow queries to be evaluated while reading only a fraction of the data, as all tuples for a particular context node are clustered together. The pipelined query framework minimizes the cost of handing intermediate data during query processing. Unlike other native solutions, VAMANA provides support for all 13 XPath axes. Our schema independent cost model provides dynamically calculated statistics that are then used for intelligent cost-based transformations, further improving performance. Our optimization strategy for increasing execution time performance is affirmed through our experimental studies on XMark benchmark data. VAMANA query execution is significantly faster than leading available XML query engines

A Clustering-based Scheme for Labeling XML Trees

Author: Masoud Rahgozar
Sadegh Soltan
Publication venue
Publication date: 11/04/2020
Field of study

Summary Tree labeling plays a key role in XML query processing. In this paper, we propose a new labeling scheme, called Clusteringbased Labeling. Unlike all previous labeling methods, In this labeling scheme elements are separated into various groups, and a label is assigned to a group of elements instead of one element. Based on Clustering-based Labeling we design a new relational schema, similar to OrdPath scheme, for storing XML documents in relational database. Grouping Sibling nodes into one record reduces number of relational records needed for XML document storage. Our experimental results shows that our storing scheme significantly is better than tree well-known relational XML storing methods in terms of number of stored records, document reconstruction time and query processing performance

CiteSeerX

The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes

Author: Apostol Elena-Simona
Darmont Jérôme
Pedersen Torben Bach
Truică Ciprian-Octavian
Publication venue: 'Elsevier BV'
Publication date: 03/02/2021
Field of study

In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.Comment: 28 pages, 6 figures, 7 table

arXiv.org e-Print Archive

VBN

Hal-Diderot

Utilizing Structural Knowledge for Information Retrieval in XML Databases

Author: Apers Peter M.G.
Blok Henk Ernst
Hiemstra Djoerd
Mihajlovic Vojkan
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2005
Field of study

In this paper we address the problem of immediate translation of eXtensible Mark-up Language (XML) information retrieval (IR) queries to relational database expressions and stress the benefits of using an intermediate XML-specific algebra over relational algebra. We show how adding an XML-specific algebra at the logical level of a DBMS enables a level of abstraction from both query languages for information retrieval in XML and the underlying physical storage and manipulation. We picked a region algebra as a basis for defining the structure aware (SA) view on XML in which we can distinguish among different XML entities, such as element nodes, text nodes, words, and determine their containment relation. Region algebras are already well established in semi-structured document processing as shown in an extensive overview of region algebra approaches in this paper. Furthermore, we propose a variant of region algebra that can support ranking operators in an elegant way while staying algebraic. As relevance scores are computed for regions in our region algebra we named it score region algebra (SRA). The benefits of introducing score region algebra are explained on a set of query examples. Besides abstracting from the query language used and the physical implementation, SRA enables a certain degree of abstraction from the retrieval model used and the opportunity to use the query optimization at the logical level of a database. Various retrieval models can be instantiated at the physical level based on the abstract specification of SRA operators. We also discuss numerous region algebra operator properties that provide a firm ground for query rewriting and optimization at the SA level, which is an important premise for the existence of such a logical view on XML

University of Twente Research Information

XML Vectorization: A Column-Based XML Storage Model

Author: Buneman Peter
Choi Byron
Publication venue: ScholarlyCommons
Publication date: 01/01/2003
Field of study

The usual method for storing tables in a relational database is to store each tuple contiguously in secondary storage. A simple alternative is to store the columns contiguously, so that a table is represented as a set of vectors all of the same length. It has been shown that such a representation performs well on queries requiring few columns. This paper reviews the shredding scheme used in XMill, an XML compressor, which represents the document structure by using a set of files, consisting of a file describing the structure, and files describing the character data to be found on designated paths (corresponding to the column data). We consider such a shredding as a storage model –- XML vectorization –- by presenting an indexing scheme and a physical algebra associated with a detailed cost model. We study query processing on the XML vectorization, in particular the XML join queries. XML join queries are often translated into a few relational join operations in the relational-based XML storage systems. The use of columns enables us to develop a fast join algorithm for vectorized XML based on two hashbased join algorithms. The important feature of the join algorithm is that the disk access of the algorithm is mostly sequential and the data not needed are not read from disk. Experimental results demonstrate the effectiveness of the join algorithm for vectorized XML

CiteSeerX

Bridging the gap between the semantic web and big data: answering SPARQL queries over NoSQL databases

Author: El Massari Hakim
Gherabi Noreddine
Mhammedi Sajida
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2022
Field of study

Nowadays, the database field has gotten much more diverse, and as a result, a variety of non-relational (NoSQL) databases have been created, including JSON-document databases and key-value stores, as well as extensible markup language (XML) and graph databases. Due to the emergence of a new generation of data services, some of the problems associated with big data have been resolved. In addition, in the haste to address the challenges of big data, NoSQL abandoned several core databases features that make them extremely efficient and functional, for instance the global view, which enables users to access data regardless of how it is logically structured or physically stored in its sources. In this article, we propose a method that allows us to query non-relational databases based on the ontology-based access data (OBDA) framework by delegating SPARQL protocol and resource description framework (RDF) query language (SPARQL) queries from ontology to the NoSQL database. We applied the method on a popular database called Couchbase and we discussed the result obtained

ZENODO

Institute of Advanced Engineering and Science

A Framework for XML-based Integration of Data, Visualization and Analysis in a Biomedical Domain

Author: Bales Nathan
Brinkley James F
Lee E. Sally
Mathur Shobhit
Re Chris
Suciu Dan
Publication venue
Publication date: 01/01/2005
Field of study

Biomedical data are becoming increasingly complex and heterogeneous in nature. The data are stored in distributed information systems, using a variety of data models, and are processed by increasingly more complex tools that analyze and visualize them. We present in this paper our framework for integrating biomedical research data and tools into a unique Web front end. Our framework is applied to the University of Washington’s Human Brain Project. Speciﬁcally, we present solutions to four integration tasks: deﬁnition of complex mappings from relational sources to XML, distributed XQuery processing, generation of heterogeneous output formats, and the integration of heterogeneous data visualization and analysis tools