52 research outputs found
The Mirror DBMS at TREC-8
The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking
Challenging Ubiquitous Inverted Files
Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach
Content And Multimedia Database Management Systems
A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ‘database approach’ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects’ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems
Database Optimization Aspects for Information Retrieval
There is a growing need for systems that can process queries, combining both structured data and text. One way to provide such functionality is to integrate information retrieval (IR) techniques in a database management system (DBMS). However, both IR and database research have been separate research fields for decades, resulting in different - even conflicting - approaches to data management.
Each DBMS has a component called a "query optimizer", which plays a crucial role in the efficiency and flexibility of the system. So, for successful integration the IR techniques and data structures, as well as the DBMS query optimizer, should be adapted to enable mutual cooperation.
The author concentrates on top-N queries - a common class of IR queries. An IR top-N query asks for the N best documents given a set of keywords. The author proposes processing the data in batches as a compromise between IR and DBMS query processing. Experiments with this technique show that porting IR optimization techniques is (still) not a promising option due to the additional administrative overhead. Two new mathematical models are introduced to eliminate this overhead: a model that predicts selectivity, which is a crucial factor in the execution costs, and a model that predicts the quality of the top-N
- …