71 research outputs found
Moa and the multi-model architecture: a new perspective on XNF2
Advanced non-traditional application domains such as geographic information systems and digital library systems demand advanced data management support. In an effort to cope with this demand, we present the concept of a novel multi-model DBMS architecture which provides evaluation of queries on complexly structured data without sacrificing efficiency. A vital role in this architecture is played by the Moa language featuring a nested relational data model based on XNF2, in which we placed renewed interest. Furthermore, extensibility in Moa avoids optimization obstacles due to black-box treatment of ADTs. The combination of a mapping of queries on complexly structured data to an efficient physical algebra expression via a nested relational algebra, extensibility open to optimization, and the consequently better integration of domain-specific algorithms, makes that the Moa system can efficiently and effectively handle complex queries from non-traditional application domains
Moa and the Multi-model architecture: a new perspective on XNF 2
Advanced non-traditional application domains such as geographic information systems and digital library systems demand advanced data management support. In an effort to cope with this demand, we present the concept of a novel multi-model DBMS architecture which provides evaluation of queries on complexly structured data without sacrificing efficiency. A vital role in this architecture is played by the Moa language featuring a nested relational data model based on XNF2 , in which we placed renewed interest. Furthermore, extensibility in Moa avoids optimization obstacles due to black-box treatment of ADTs. The combination of a mapping of queries on complexly structured data to an efficient physical algebra expression via a nested relational algebra, extensibility open to optimization, and the consequently better integration of domain-specific algorithms, makes that the Moa system can efficiently handle complex queries from non-traditional application domains
From Nested-Loop to Join Queries in OODB
Most declarative SQL-like query languages for object-oriented database systems are orthogonal languages allowing for arbitrary nesting of expressions in the select-, from-, and where-clause. Expressions in the from-clause may be base tables as well as set-valued attributes. In this paper, we propose a general strategy for the optimization of nested OOSQL queries. As in the relational model, the translation/optimization goal is to move from tuple- to set-oriented query processing. Therefore, OOSQL is translated into the algebraic language ADL, and by means of algebraic rewriting nested queries are transformed into join queries as far as possible. Three different optimization options are described, and a strategy to assign priorities to options is proposed
Content And Multimedia Database Management Systems
A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ‘database approach’ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects’ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems
Database Optimization Aspects for Information Retrieval
There is a growing need for systems that can process queries, combining both structured data and text. One way to provide such functionality is to integrate information retrieval (IR) techniques in a database management system (DBMS). However, both IR and database research have been separate research fields for decades, resulting in different - even conflicting - approaches to data management.
Each DBMS has a component called a "query optimizer", which plays a crucial role in the efficiency and flexibility of the system. So, for successful integration the IR techniques and data structures, as well as the DBMS query optimizer, should be adapted to enable mutual cooperation.
The author concentrates on top-N queries - a common class of IR queries. An IR top-N query asks for the N best documents given a set of keywords. The author proposes processing the data in batches as a compromise between IR and DBMS query processing. Experiments with this technique show that porting IR optimization techniques is (still) not a promising option due to the additional administrative overhead. Two new mathematical models are introduced to eliminate this overhead: a model that predicts selectivity, which is a crucial factor in the execution costs, and a model that predicts the quality of the top-N
The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis
Getting the best performance from the ever-increasing number of hardware
platforms has been a recurring challenge for data processing systems. In recent
years, the advent of data science with its increasingly numerous and complex
types of analytics has made this challenge even more difficult. In practice,
system designers are overwhelmed by the number of combinations and typically
implement only one analysis/platform combination, leading to repeated
implementation effort -- and a plethora of semi-compatible tools for data
scientists.
In this paper, we propose the "Collection Virtual Machine" (or CVM) -- an
extensible compiler framework designed to keep the specialization process of
data analytics systems tractable. It can capture at the same time the essence
of a large span of low-level, hardware-specific implementation techniques as
well as high-level operations of different types of analyses. At its core lies
a language for defining nested, collection-oriented intermediate
representations (IRs). Frontends produce programs in their IR flavors defined
in that language, which get optimized through a series of rewritings (possibly
changing the IR flavor multiple times) until the program is finally expressed
in an IR of platform-specific operators. While reducing the overall
implementation effort, this also improves the interoperability of both analyses
and hardware platforms. We have used CVM successfully to build specialized
backends for platforms as diverse as multi-core CPUs, RDMA clusters, and
serverless computing infrastructure in the cloud and expect similar results for
many more frontends and hardware platforms in the near future.Comment: This paper is currently under review at DaMoN'2
- …