Search CORE

865 research outputs found

Dynamic Physiological Partitioning on a Shared-nothing Database Cluster

Author: Härder Theo
Schall Daniel
Publication venue
Publication date: 02/07/2014
Field of study

Traditional DBMS servers are usually over-provisioned for most of their daily workloads and, because they do not show good-enough energy proportionality, waste a lot of energy while underutilized. A cluster of small (wimpy) servers, where its size can be dynamically adjusted to the current workload, offers better energy characteristics for these workloads. Yet, data migration, necessary to balance utilization among the nodes, is a non-trivial and time-consuming task that may consume the energy saved. For this reason, a sophisticated and easy to adjust partitioning scheme fostering dynamic reorganization is needed. In this paper, we adapt a technique originally created for SMP systems, called physiological partitioning, to distribute data among nodes, that allows to easily repartition data without interrupting transactions. We dynamically partition DB tables based on the nodes' utilization and given energy constraints and compare our approach with physical partitioning and logical partitioning methods. To quantify possible energy saving and its conceivable drawback on query runtimes, we evaluate our implementation on an experimental cluster and compare the results w.r.t. performance and energy consumption. Depending on the workload, we can substantially save energy without sacrificing too much performance

arXiv.org e-Print Archive

CiteSeerX

Crossref

Process algebra approach to parallel DBMS performance modelling

Author: Pua Chai Seng
Publication venue: Computing and Electrical Engineering
Publication date: 01/01/1999
Field of study

Abstract unavailable please refer to PD

ROS: The Research Output Service. Heriot-Watt University Edinburgh

OpenGrey Repository

On Flexible Allocation of Index and Temporary Data in Parallel Database Systems

Author: Märtens Holger
Rahm Erhard
Stöhr Thomas
Publication venue
Publication date: 23/10/2018
Field of study

Qucosa - Publikationsserver der Universität Leipzig

Lustre, Hadoop, Accumulo

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Edwards Lauren
Gadepally Vijay
Hubbell Matthew
Kepner Jeremy
Michaleas Peter
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/07/2015
Field of study

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10,000x lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing conference, Waltham, MA, 201

arXiv.org e-Print Archive

Crossref

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

Controlling Disk Contention for Parallel Query Processing in Shared Disk Database Systems

Author: Rahm Erhard
Stöhr Thomas
Publication venue: Universität Leipzig
Publication date: 08/07/2019
Field of study

Shared Disk database systems offer a high flexibility for parallel transaction and query processing. This is because each node can process any transaction, query or subquery because it has access to the entire database. Compared to Shared Nothing, this is particularly advantageous for scan queries for which the degree of intra-query parallelism as well as the scan processors themselves can dynamically be chosen. On the other hand, there is the danger of disk contention between subqueries, in particular for index scans. We present a detailed simulation study to analyze the effectiveness of parallel scan processing in Shared Disk database systems. In particular, we investigate the relationship between the degree of declustering and the degree of scan parallelism for relation scans, clustered index scans, and non-clustered index scans. Furthermore, we study the usefulness of disk caches and prefetching for limiting disk contention. Finally, we show the importance of dynamically choosing the degree of scan parallelism to control disk contention in multi-user mode

Qucosa - Publikationsserver der Universität Leipzig