54 research outputs found
MIL primitives for querying a fragmented world
In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet Interpreter Language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation of MIL
MonetDB/XQuery - Consistent & Efficient Updates on the Pre/Post Plane
Relational XQuery processors aim at leveraging mature relational DBMS query processing technology to provide scalability and efficiency. To achieve this goal, various storage schemes have been proposed to encode the tree structure of XML documents in flat relational tables. Basically, two classes can be identified: (1) encodings using fixed-length surrogates, like the preorder ranks in the pre/post encoding [5] or the equivalent pre/size/level encoding [8], and (2) encodings using variable-length surrogates, like, e.g., ORDPATH [9] or P-PBiTree [12]. Recent research [1] showed a clear advantage of the former for efficient evaluation of XPath location steps, exploiting techniques like cheap node order tests, positional lookup, and node skipping in staircase join [7]. However, once updates are involved, variable-length surrogates are often considered the better choice, mainly as a straightforward implementation of structural XML updates using fixed-length surrogates faces two performance bottlenecks: (i) high physical cost (the preorder ranks of all nodes following the update position must be modifiedâon average 50% of the document), and (ii) low transaction concurrency (updating the size of all ancestor nodes causes lock contention on the document root)
Moa and the multi-model architecture: a new perspective on XNF2
Advanced non-traditional application domains such as geographic information systems and digital library systems demand advanced data management support. In an effort to cope with this demand, we present the concept of a novel multi-model DBMS architecture which provides evaluation of queries on complexly structured data without sacrificing efficiency. A vital role in this architecture is played by the Moa language featuring a nested relational data model based on XNF2, in which we placed renewed interest. Furthermore, extensibility in Moa avoids optimization obstacles due to black-box treatment of ADTs. The combination of a mapping of queries on complexly structured data to an efficient physical algebra expression via a nested relational algebra, extensibility open to optimization, and the consequently better integration of domain-specific algorithms, makes that the Moa system can efficiently and effectively handle complex queries from non-traditional application domains
MonetDB/X100 - A DBMS in the CPU cache
X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces t
Navigating through a forest of quad trees to spot images in a database
This paper describes how we maintain color and spatial index information on more than 1,000,000 images and how we allow users to browse the spatial color feature space. We break down all our images in color-based quad trees and we store all quad trees in our main-memory database. We allow users to browse the quad trees directly, or they can pre-select images through our color bit vector, which acts as an index accelerator. A Java based textsc{gui is used to navigate through our image indexes
Bulkloading and Maintaining XML Documents
The popularity of XML as a exchange and storage format brings about massive amounts of documents to be stored, maintained and analyzed -- a challenge that traditionally has been tackled with Database Management Systems (DBMS). To open up the content of XML documents to analysis with declarative query languages, efficient bulk loading techniques are necessary.
Database technology has traditionally been offering support for these tasks but yet falls short of providing efficient automation techniques for the challenges that large collections of XML data raise. As storage back-end, many applications rely on relational databases, which are designed towards large data volumes. This paper studies the bulk load and update algorithms for XML data stored in relational format and outlines opportunities and problems. We investigate both (1) bulk insertion and deletion as well as (2) updates in the form of edit scripts which heavily use pointer-chasing techniques which often are considered orthogonal to the algebraic operations relational databases are optimized for. To get the most out of relational database systems, we show that one should make careful use of edit scripts and replace them with bulk operations if more than a very small portion of the database is updated.
We implemented our ideas on top of the Monet Database System and benchmarked their performance
Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented Processing
Recursive queries and recursive derived tables constitute an important part
of the SQL standard. Their efficient processing is important for many real-life
applications that rely on graph or hierarchy traversal. Position-enabled
column-stores offer a novel opportunity to improve run times for this type of
queries. Such systems allow the engine to explicitly use data positions (row
ids) inside its core and thus, enable novel efficient implementations of query
plan operators.
In this paper, we present an approach that significantly speeds up recursive
query processing inside RDBMSes. Its core idea is to employ a particular aspect
of column-store technology (late materialization) which enables the query
engine to manipulate data positions during query execution. Based on it, we
propose two sets of Volcano-style operators intended to process different query
cases.
In order validate our ideas, we have implemented the proposed approach in
PosDB, an RDBMS column-store with SQL support. We experimentally demonstrate
the viability of our approach by providing a comparison with PostgreSQL.
Experiments show that for breadth-first search: 1) our position-based approach
yields up to 6x better results than PostgreSQL, 2) our tuple-based one results
in only 3x improvement when using a special rewriting technique, but it can
work in a larger number of cases, and 3) both approaches can't be emulated in
row-stores efficiently
Moa and the Multi-model architecture: a new perspective on XNF 2
Advanced non-traditional application domains such as geographic information systems and digital library systems demand advanced data management support. In an effort to cope with this demand, we present the concept of a novel multi-model DBMS architecture which provides evaluation of queries on complexly structured data without sacrificing efficiency. A vital role in this architecture is played by the Moa language featuring a nested relational data model based on XNF2 , in which we placed renewed interest. Furthermore, extensibility in Moa avoids optimization obstacles due to black-box treatment of ADTs. The combination of a mapping of queries on complexly structured data to an efficient physical algebra expression via a nested relational algebra, extensibility open to optimization, and the consequently better integration of domain-specific algorithms, makes that the Moa system can efficiently handle complex queries from non-traditional application domains
- âŠ