Search CORE

8,098 research outputs found

Multidimensional Range Queries on Modern Hardware

Author: Leser Ulf
Schäfer Patrick
Sprenger Stefan
Publication venue
Publication date: 14/05/2018
Field of study

Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries

arXiv.org e-Print Archive

Crossref

Dynamic pipelining of multidimensional range queries

Author: Duch Brown Amalia
Lugosi Daniel
Pasarella Sánchez Ana Edelmira
Zoltan Torres Ana Cristina
Publication venue
Publication date: 01/01/2019
Field of study

The problem of evaluating orthogonal range queries efficiently has been studied widely in the data structures community. It has been common wisdom for several years that for queries containing more than 20% of the elements of the dataset a linear scanning of the data was the most efficient solution. In recent experimental works using modern hardware –with main memory and parallelism– the conclusion is that linear scan is preferable for almost every query configuration (even containing a 1% of the data). In this work we propose an alternative approach to evaluate multidimensional range queries based on the dynamic pipeline paradigm –using main memory and concurrency. Our aim is to prove that under this framework, it is possible to beat the performance of linear scanning by the one of hierarchical multidimensional data structures –such as kd trees, quad trees, Rtrees or similar.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Efficient Processing of Range Queries in Main Memory

Author: Sprenger Stefan
Publication venue: Humboldt-Universität zu Berlin
Publication date: 11/03/2019
Field of study

Datenbanksysteme verwenden Indexstrukturen, um Suchanfragen zu beschleunigen. Im Laufe der letzten Jahre haben Forscher verschiedene Ansätze zur Indexierung von Datenbanktabellen im Hauptspeicher entworfen. Hauptspeicherindexstrukturen versuchen möglichst häufig Daten zu verwenden, die bereits im Zwischenspeicher der CPU vorrätig sind, anstatt, wie bei traditionellen Datenbanksystemen, die Zugriffe auf den externen Speicher zu optimieren. Die meisten vorgeschlagenen Indexstrukturen für den Hauptspeicher beschränken sich jedoch auf Punktabfragen und vernachlässigen die ebenso wichtigen Bereichsabfragen, die in zahlreichen Anwendungen, wie in der Analyse von Genomdaten, Sensornetzwerken, oder analytischen Datenbanksystemen, zum Einsatz kommen. Diese Dissertation verfolgt als Hauptziel die Fähigkeiten von modernen Hauptspeicherdatenbanksystemen im Ausführen von Bereichsabfragen zu verbessern. Dazu schlagen wir zunächst die Cache-Sensitive Skip List, eine neue aktualisierbare Hauptspeicherindexstruktur, vor, die für die Zwischenspeicher moderner Prozessoren optimiert ist und das Ausführen von Bereichsabfragen auf einzelnen Datenbankspalten ermöglicht. Im zweiten Abschnitt analysieren wir die Performanz von multidimensionalen Bereichsabfragen auf modernen Serverarchitekturen, bei denen Daten im Hauptspeicher hinterlegt sind und Prozessoren über SIMD-Instruktionen und Multithreading verfügen. Um die Relevanz unserer Experimente für praktische Anwendungen zu erhöhen, schlagen wir zudem einen realistischen Benchmark für multidimensionale Bereichsabfragen vor, der auf echten Genomdaten ausgeführt wird. Im letzten Abschnitt der Dissertation präsentieren wir den BB-Tree als neue, hochperformante und speichereffziente Hauptspeicherindexstruktur. Der BB-Tree ermöglicht das Ausführen von multidimensionalen Bereichs- und Punktabfragen und verfügt über einen parallelen Suchoperator, der mehrere Threads verwenden kann, um die Performanz von Suchanfragen zu erhöhen.Database systems employ index structures as means to accelerate search queries. Over the last years, the research community has proposed many different in-memory approaches that optimize cache misses instead of disk I/O, as opposed to disk-based systems, and make use of the grown parallel capabilities of modern CPUs. However, these techniques mainly focus on single-key lookups, but neglect equally important range queries. Range queries are an ubiquitous operator in data management commonly used in numerous domains, such as genomic analysis, sensor networks, or online analytical processing. The main goal of this dissertation is thus to improve the capabilities of main-memory database systems with regard to executing range queries. To this end, we first propose a cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, which targets the execution of range queries on single database columns. Second, we study the performance of multidimensional range queries on modern hardware, where data are stored in main memory and processors support SIMD instructions and multi-threading. We re-evaluate a previous rule of thumb suggesting that, on disk-based systems, scans outperform index structures for selectivities of approximately 15-20% or more. To increase the practical relevance of our analysis, we also contribute a novel benchmark consisting of several realistic multidimensional range queries applied to real- world genomic data. Third, based on the outcomes of our experimental analysis, we devise a novel, fast and space-effcient, main-memory based index structure, the BB- Tree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Development of Distributed Research Center for analysis of regional climatic and environmental changes

Author: Gordov E.
Okladnikov I.
Prusevich Alexander A.
Shiklomanov Alexander I.
Titov A.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2016
Field of study

We present an approach and first results of a collaborative project being carried out by a joint team of researchers from the Institute of Monitoring of Climatic and Ecological Systems, Russia and Earth Systems Research Center UNH, USA. Its main objective is development of a hardware and software platform prototype of a Distributed Research Center (DRC) for monitoring and projecting of regional climatic and environmental changes in the Northern extratropical areas. The DRC should provide the specialists working in climate related sciences and decision-makers with accurate and detailed climatic characteristics for the selected area and reliable and affordable tools for their in-depth statistical analysis and studies of the effects of climate change. Within the framework of the project, new approaches to cloud processing and analysis of large geospatial datasets (big geospatial data) inherent to climate change studies are developed and deployed on technical platforms of both institutions. We discuss here the state of the art in this domain, describe web based information-computational systems developed by the partners, justify the methods chosen to reach the project goal, and briefly list the results obtained so far

UNH Scholars' Repository

On the evaluation of exact-match and range queries over multidimensional data in distributed hash tables

Author: Malensek Matthew
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2012
Field of study

2012 Fall.Includes bibliographical references.The quantity and precision of geospatial and time series observational data being collected has increased alongside the steady expansion of processing and storage capabilities in modern computing hardware. The storage requirements for this information are vastly greater than the capabilities of a single computer, and are primarily met in a distributed manner. However, distributed solutions often impose strict constraints on retrieval semantics. In this thesis, we investigate the factors that influence storage and retrieval operations on large datasets in a cloud setting, and propose a lightweight data partitioning and indexing scheme to facilitate these operations. Our solution provides expressive retrieval support through range-based and exact-match queries and can be applied over massive quantities of multidimensional data. We provide benchmarks to illustrate the relative advantage of using our solution over a general-purpose cloud storage engine in a distributed network of heterogeneous computing resources

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Viewpoints: A high-performance high-dimensional exploratory data analysis tool

Author: C. Levit
Høg E.
M. J. Way
P. R. Gazis
Publication venue: 'University of Chicago Press'
Publication date: 08/11/2010
Field of study

Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large, high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more closely to that to be publishe

arXiv.org e-Print Archive

Crossref

A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

Author: Alzeini H I
Habaebi M H
Hameed Sh A
Publication venue
Publication date: 01/12/2013
Field of study

The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses

arXiv.org e-Print Archive

The International Islamic University Malaysia Repository

Data Management and Mining in Astrophysical Databases

Author: De Angelis A.
Frailis M.
Roberto V.
Publication venue
Publication date: 01/01/2003
Field of study

We analyse the issues involved in the management and mining of astrophysical data. The traditional approach to data management in the astrophysical field is not able to keep up with the increasing size of the data gathered by modern detectors. An essential role in the astrophysical research will be assumed by automatic tools for information extraction from large datasets, i.e. data mining techniques, such as clustering and classification algorithms. This asks for an approach to data management based on data warehousing, emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Clustering and classification techniques, on large datasets, pose additional requirements: computational and memory scalability with respect to the data size, interpretability and objectivity of clustering or classification results. In this study we address some possible solutions.Comment: 10 pages, Late

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine