Search CORE

7 research outputs found

Optimal Multidimensional Query Processing Using Tree Striping

Author: C. Faloutsos
I. Gargantini
J. Nievergelt
J.D. Ullman
K. Lin
R. Bayer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

Abstract. In this paper, we propose a new technique for multidimensional query processing which can be widely applied in database systems. Our new technique, called tree striping, generalizes the well-known inverted lists and multidimension-al indexing approaches. A theoretical analysis of our generalized technique shows that both, inverted lists and multidimensional indexing approaches, are far from being optimal. A consequence of our analysis is that the use of a set of multidimen-sional indexes provides considerable improvements over one d-dimensional index (multidimensional indexing) or d one-dimensional indexes (inverted lists). The basic idea of tree striping is to use the optimal number k of lower-dimensional indexes determined by our theoretical analysis for efficient query processing. We confirm our theoretical results by an experimental evaluation on large amounts of real and synthetic data. The results show a speed-up of up to 310 % over the multi-dimensional indexing approach and a speed-up factor of up to 123 (12,300%) over the inverted-lists approach. 1

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

Crossref

Browsing Digital Collections with Reconfigurable Faceted Thesauri

Author: Gayoso-Cabada Joaquín
Rodríguez-Cerezo Daniel
Sierra José-Luis
Publication venue: AIS Electronic Library (AISeL)
Publication date: 26/09/2016
Field of study

Faceted thesauri group classification terms into hierarchically arranged facets. They enable faceted browsing, a well-known browsing technique that makes it possible to navigate digital collections by recursively choosing terms in the facet hierarchy. In this paper we develop an approach to achieve faceted browsing in live collections, in which not only the contents but also the thesauri can be constantly reorganized. We start by introducing a digital collection model letting users reconfigure facet hierarchies. Then we introduce navigation automata as an efficient way of supporting faceted browsing in these collections. Since, in the worst-case, the number of states in these automata can grow exponentially, we propose two alternative indexing strategies able to bridge this complexity: inverted indexes and navigation dendrograms. Finally, by comparing these strategies in the context of Clavy, a system for managing collections with reconfigurable structures in digital humanities and educational settings, we provide evidence that navigation dendrogram organization outperforms the inverted index-based one

AIS Electronic Library (AISeL)

Multidimensional Range Queries on Modern Hardware

Author: Leser Ulf
Schäfer Patrick
Sprenger Stefan
Publication venue
Publication date: 14/05/2018
Field of study

Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries

arXiv.org e-Print Archive

Crossref

Region clustering based evaluation of multiple top-N selection queries

Author: Andrade
Bruno
Chakrabarti
Chaudhuri
Chunnian Liu
Hristidis
Ilyas
Liang Zhu
Marian
Meng
Motro
O’Neil
Sellis
Sellis
Silberschatz
Stoica
Weiyi Meng
Wenzhu Yang
Zhu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Efficient Processing of Range Queries in Main Memory

Author: Sprenger Stefan
Publication venue: Humboldt-Universität zu Berlin
Publication date: 11/03/2019
Field of study

Datenbanksysteme verwenden Indexstrukturen, um Suchanfragen zu beschleunigen. Im Laufe der letzten Jahre haben Forscher verschiedene Ansätze zur Indexierung von Datenbanktabellen im Hauptspeicher entworfen. Hauptspeicherindexstrukturen versuchen möglichst häufig Daten zu verwenden, die bereits im Zwischenspeicher der CPU vorrätig sind, anstatt, wie bei traditionellen Datenbanksystemen, die Zugriffe auf den externen Speicher zu optimieren. Die meisten vorgeschlagenen Indexstrukturen für den Hauptspeicher beschränken sich jedoch auf Punktabfragen und vernachlässigen die ebenso wichtigen Bereichsabfragen, die in zahlreichen Anwendungen, wie in der Analyse von Genomdaten, Sensornetzwerken, oder analytischen Datenbanksystemen, zum Einsatz kommen. Diese Dissertation verfolgt als Hauptziel die Fähigkeiten von modernen Hauptspeicherdatenbanksystemen im Ausführen von Bereichsabfragen zu verbessern. Dazu schlagen wir zunächst die Cache-Sensitive Skip List, eine neue aktualisierbare Hauptspeicherindexstruktur, vor, die für die Zwischenspeicher moderner Prozessoren optimiert ist und das Ausführen von Bereichsabfragen auf einzelnen Datenbankspalten ermöglicht. Im zweiten Abschnitt analysieren wir die Performanz von multidimensionalen Bereichsabfragen auf modernen Serverarchitekturen, bei denen Daten im Hauptspeicher hinterlegt sind und Prozessoren über SIMD-Instruktionen und Multithreading verfügen. Um die Relevanz unserer Experimente für praktische Anwendungen zu erhöhen, schlagen wir zudem einen realistischen Benchmark für multidimensionale Bereichsabfragen vor, der auf echten Genomdaten ausgeführt wird. Im letzten Abschnitt der Dissertation präsentieren wir den BB-Tree als neue, hochperformante und speichereffziente Hauptspeicherindexstruktur. Der BB-Tree ermöglicht das Ausführen von multidimensionalen Bereichs- und Punktabfragen und verfügt über einen parallelen Suchoperator, der mehrere Threads verwenden kann, um die Performanz von Suchanfragen zu erhöhen.Database systems employ index structures as means to accelerate search queries. Over the last years, the research community has proposed many different in-memory approaches that optimize cache misses instead of disk I/O, as opposed to disk-based systems, and make use of the grown parallel capabilities of modern CPUs. However, these techniques mainly focus on single-key lookups, but neglect equally important range queries. Range queries are an ubiquitous operator in data management commonly used in numerous domains, such as genomic analysis, sensor networks, or online analytical processing. The main goal of this dissertation is thus to improve the capabilities of main-memory database systems with regard to executing range queries. To this end, we first propose a cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, which targets the execution of range queries on single database columns. Second, we study the performance of multidimensional range queries on modern hardware, where data are stored in main memory and processors support SIMD instructions and multi-threading. We re-evaluate a previous rule of thumb suggesting that, on disk-based systems, scans outperform index structures for selectivities of approximately 15-20% or more. To increase the practical relevance of our analysis, we also contribute a novel benchmark consisting of several realistic multidimensional range queries applied to real- world genomic data. Third, based on the outcomes of our experimental analysis, we devise a novel, fast and space-effcient, main-memory based index structure, the BB- Tree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Gestión de colecciones digitales con esquemas de catalogación reconfigurables

Author: Gayoso Cabada Joaquín
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 30/06/2017
Field of study

Agradezco el apoyo recibido durante estos años por parte de todos los miembros de mi grupo de investigación ILSA en la Facultad de Informática de la Universidad Complutense de Madrid. También a los grupos de investigación LEETHI y LOEP pertenecientes también a la Universidad Complutense, y a la Fundación El Caño de Panamá, sin los que no habría podido realizar parte de los experimentos expuestos en los trabajos.A título personal, deseo agradecer a mis directores José Luis Sierra, Ana Fernández-Pampillón, Antonio Sarasa, y compañeros de grupo de investigación Alfredo Fernández Valmayor, Daniel Rodríguez, Bryan Temprado y César Ruiz por darme la oportunidad de desarrollar estos años de investigación con ellos sobre este campo, esfuerzo que concluye en esta tesis, y por todo lo que me han enseñado sobre cómo ser un buen investigador.Dentro de la universidad también deseo dar las gracias a mis compañeros del “Aula16”: Toni, Dan, Iván, Víctor, Jesús, Pablo, Cristina y Marta con los que he compartido muchas comidas, y cafés, a lo largo de estos años divagando sobre informática. También quiero dar las gracias a mis actuales compañeros del “420bip”: Susana, Vicky, Carlos y Noelia, que me han visto dando los últimos remates estos meses a esta tesis y me han ayudado en todo lo que han podido..

Docta Complutense