Search CORE

13,361 research outputs found

Archiving scientific data

Author: Altinel M.
Avilla-Campillo I.
Buneman P.
Chawathe S. S.
Chawathe S. S.
Chien S.
Cobena G.
Diao Y.
Keishi Tajima
Marian A.
Peter Buneman
Sanjeev Khanna
Schmidt A. R.
Tufte K.
Wang-Chiew Tan
Publication venue
Publication date: 01/01/2002
Field of study

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools

CiteSeerX

Crossref

Edinburgh Research Explorer

ScholarlyCommons@Penn

06472 Abstracts Collection - XQuery Implementation Paradigms

Author: Boncz Peter A.
Grust Torsten
Keulen Maurice van
Siméon Jerome
Publication venue: Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI)
Publication date: 01/01/2007
Field of study

From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

University of Twente Research Information

MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Author: Boncz P.
Grust T.
Keulen M. van
Manegold S.
Rittinger J.
Teubner J.
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met

CiteSeerX

Crossref

CWI's Institutional Repository

University of Twente Research Information

T ${}^2$ K ${}^2$ : The Twitter Top-K Keywords Benchmark

Author: A Guille
AE Gattiker
CD Manning
D Kılınç
DD Lewis
F Ravat
J Darmont
J Ferrarons
J Gray
J O’Shea
JD Cooper
K Spärck Jones
K Spärck Jones
L Wang
S Bringay
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2017
Field of study

Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T

{}^2

{}^2

, which features a real tweet dataset and queries with various complexities and selectivities. T

{}^2

{}^2

helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T

{}^2

{}^2

's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

arXiv.org e-Print Archive

Crossref

HAL

Hal-Diderot

A Database Approach to Content-based XML retrieval

Author: Hiemstra D.
Publication venue: European Research Consortium for Informatics and Mathematics (ERCIM)
Publication date: 01/01/2002
Field of study

This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments

CiteSeerX

Radboud Repository

University of Twente Research Information

Challenging Ubiquitous Inverted Files

Author: Vries A.P. de
Publication venue: European Research Consortium for Informatics and Mathematics (ERCIM)
Publication date: 01/01/2000
Field of study

Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach

CWI's Institutional Repository

University of Twente Research Information