28 research outputs found
Dynamic Physiological Partitioning on a Shared-nothing Database Cluster
Traditional DBMS servers are usually over-provisioned for most of their daily
workloads and, because they do not show good-enough energy proportionality,
waste a lot of energy while underutilized. A cluster of small (wimpy) servers,
where its size can be dynamically adjusted to the current workload, offers
better energy characteristics for these workloads. Yet, data migration,
necessary to balance utilization among the nodes, is a non-trivial and
time-consuming task that may consume the energy saved. For this reason, a
sophisticated and easy to adjust partitioning scheme fostering dynamic
reorganization is needed. In this paper, we adapt a technique originally
created for SMP systems, called physiological partitioning, to distribute data
among nodes, that allows to easily repartition data without interrupting
transactions. We dynamically partition DB tables based on the nodes'
utilization and given energy constraints and compare our approach with physical
partitioning and logical partitioning methods. To quantify possible energy
saving and its conceivable drawback on query runtimes, we evaluate our
implementation on an experimental cluster and compare the results w.r.t.
performance and energy consumption. Depending on the workload, we can
substantially save energy without sacrificing too much performance
Final Report: Efficient Databases for MPC Microdata
The purpose of this grant was to develop the theory and practice of high-performance databases for massive streamed datasets. Over the last three years, we have developed fast indexing technology, that is, technology for rapidly ingesting data and storing that data so that it can be efficiently queried and analyzed. During this project we developed the technology so that high-bandwidth data streams can be indexed and queried efficiently. Our technology has been proven to work data sets composed of tens of billions of rows when the data streams arrives at over 40,000 rows per second. We achieved these numbers even on a single disk driven by two cores. Our work comprised (1) new write-optimized data structures with better asymptotic complexity than traditional structures, (2) implementation, and (3) benchmarking. We furthermore developed a prototype of TokuFS, a middleware layer that can handle microdata I/O packaged up in an MPI-IO abstraction
Recommended from our members
Multi-Version Search and Cache-Conscious Ranking Optimization
Organizations and companies archive many versions of digital data such as web pages, internal emails and so on. Such data is critical for internal investigation, regulatory compliance, and electronic discovery. It is estimated that electronic discovery market that leverages archival data will reach $9.9 billions globally in 2017. It is not uncommon for many businesses to retain archived collections for 10 to 15 years. How to archive these versioned data is worth to study and we are facing many challenges including 1) traditional index occupies too much space for versioned data, 2) traditional search is too slow on versioned data, and 3) how to guarantee high accuracy when improving efficiency in new architecture.In this dissertation, we take the opportunity of the fast development of information retrieval and tackle the problem by proposing a new multi-version search architecture with cache-conscious ranking optimization framework. Specifically, we will first discuss our new versioned search architecture. Then, we will talk about a cache-conscious online ranking algorithm to improve the online part. Finally, we will describe a framework to select best blocking methods and parameters for our algorithm to achieve best performance.Firstly, we present our new multi-version search architecture. We propose an approach that uses cluster-based retrieval to quickly narrow the search scope guided by version representatives at Phase 1 and develops a hybrid index structure with adaptive runtime data traversal to speed up Phase 2 search. The hybrid scheme exploits the advantages of forward index and inverted index based on the term characteristics to minimize the time in extracting positional and other feature information during runtime search. We compare several indexing and data traversal options with different time and space tradeoffs and describe evaluation results to demonstrate their effectiveness. The experiment results show that the proposed scheme can be up-to about 4x as fast as the previous work on solid state drives while retaining good relevance.Secondly, we talk about our 2D blocking algorithm to optimize the online ranking part of the system. Multi-tree ensemble models have been proven to be effective for document ranking. Using a large number of trees can improve accuracy, but it takes time to calculate ranking scores of matched documents. We investigate data traversal methods for fast score calculation with a large ensemble and propose a 2D blocking scheme for better cache utilization with simpler code structure compared to previous work. The experiments with several benchmarks show significant acceleration in score calculation without loss of ranking accuracy.Lastly, we describe a framework to fast select best blocking methods and parameters for our 2D blocking algorithm with the help of a full cache analysis. 2D blocking method is very helpful to improve online search efficiency. However, different traversal methods and blocking parameter settings can exhibit different cache and cost behavior depending on data and architectural characteristics. It is very time-consuming to conduct exhaustive search for performance comparison and optimum selection. We provide an analytic comparison of cache blocking methods on their data access performance for an approximation and propose a fast guided sampling scheme to select a traversal method and blocking parameters for effective use of memory hierarchy. The evaluation studies with three datasets show that within a reasonable amount of time, the proposed scheme can identify a highly competitive solution that significantly accelerates score calculation.In summary, we have proposed a new multi-version search architecture with cache-conscious ranking optimization for the online search part and a framework to help fast select best blocking methods and parameters with full cache analysis for the 2D blocking method. By proposing this new versioned search system, we can meet challenges from scalability, efficiency and accuracy in multi-version search, and we believe this work would be useful to future researchers in this direction
Efficient bulk-loading methods for temporal and multidimensional index structures
Nahezu alle naturwissenschaftlichen Bereiche profitieren von neuesten Analyse- und Verarbeitungsmethoden fĂĽr groĂźe Datenmengen. Diese Verfahren setzten eine effiziente Verarbeitung von geo- und zeitbezogenen Daten voraus, da die Zeit und die Position wichtige Attribute vieler Daten
sind. Die effiziente Anfrageverarbeitung wird insbesondere durch den Einsatz von Indexstrukturen
ermöglicht. Im Fokus dieser Arbeit liegen zwei Indexstrukturen: Multiversion B-Baum
(MVBT) und R-Baum. Die erste Struktur wird fĂĽr die Verwaltung von zeitbehafteten Daten,
die zweite fĂĽr die Indexierung von mehrdimensionalen Rechteckdaten eingesetzt.
Ständig- und schnellwachsendes Datenvolumen stellt eine große Herausforderung an die Informatik
dar. Der Aufbau und das Aktualisieren von Indexen mit herkömmlichen Methoden (Datensatz
fĂĽr Datensatz) ist nicht mehr effizient. Um zeitnahe und kosteneffiziente Datenverarbeitung
zu ermöglichen, werden Verfahren zum schnellen Laden von Indexstrukturen dringend benötigt.
Im ersten Teil der Arbeit widmen wir uns der Frage, ob es ein Verfahren fĂĽr das Laden von MVBT
existiert, das die gleiche I/O-Komplexität wie das externe Sortieren besitz. Bis jetzt blieb diese
Frage unbeantwortet. In dieser Arbeit haben wir eine neue Kostruktionsmethode entwickelt und
haben gezeigt, dass diese gleiche Zeitkomplexität wie das externe Sortieren besitzt. Dabei haben
wir zwei algorithmische Techniken eingesetzt: Gewichts-Balancierung und Puffer-Bäume. Unsere
Experimenten zeigen, dass das Resultat nicht nur theoretischer Bedeutung ist.
Im zweiten Teil der Arbeit beschäftigen wir uns mit der Frage, ob und wie statistische Informationen
über Geo-Anfragen ausgenutzt werden können, um die Anfrageperformanz von R-Bäumen zu
verbessern. Unsere neue Methode verwendet Informationen wie Seitenverhältnis und Seitenlängen
eines repräsentativen Anfragerechtecks, um einen guten R-Baum bezüglich eines häufig eingesetzten
Kostenmodells aufzubauen. Falls diese Informationen nicht verfĂĽgbar sind, optimieren
wir R-Bäume bezüglich der Summe der Volumina von minimal umgebenden Rechtecken der Blattknoten.
Da das Problem des Aufbaus von optimalen R-Bäumen bezüglich dieses Kostenmaßes
NP-hart ist, führen wir zunächst das Problem auf ein eindimensionales Partitionierungsproblem
zurück, indem wir die Daten bezüglich optimierte raumfüllende Kurven sortieren. Dann lösen
wir dieses Problem durch Einsatz vom dynamischen Programmieren. Die I/O-Komplexität des
Verfahrens ist gleich der von externem Sortieren, da die I/O-Laufzeit der Methode durch die
Laufzeit des Sortierens dominiert wird.
Im letzten Teil der Arbeit haben wir die entwickelten Partitionierungsvefahren fĂĽr den Aufbau
von Geo-Histogrammen eingesetzt, da diese ähnlich zu R-Bäumen eine disjunkte Partitionierung
des Raums erzeugen. Ergebnisse von intensiven Experimenten zeigen, dass sich unter Verwendung
von neuen Partitionierungstechniken sowohl R-Bäume mit besserer Anfrageperformanz als
auch Geo-Histogrammen mit besserer Schätzqualität im Vergleich zu Konkurrenzverfahren generieren
lassen
Letter from the Special Issue Editor
Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies
Honeycomb: ordered key-value store acceleration on an FPGA-based SmartNIC
In-memory ordered key-value stores are an important building block in modern
distributed applications. We present Honeycomb, a hybrid software-hardware
system for accelerating read-dominated workloads on ordered key-value stores
that provides linearizability for all operations including scans. Honeycomb
stores a B-Tree in host memory, and executes SCAN and GET on an FPGA-based
SmartNIC, and PUT, UPDATE and DELETE on the CPU. This approach enables large
stores and simplifies the FPGA implementation but raises the challenge of data
access and synchronization across the slow PCIe bus. We describe how Honeycomb
overcomes this challenge with careful data structure design, caching, request
parallelism with out-of-order request execution, wait-free read operations, and
batching synchronization between the CPU and the FPGA. For read-heavy YCSB
workloads, Honeycomb improves the throughput of a state-of-the-art ordered
key-value store by at least 1.8x. For scan-heavy workloads inspired by cloud
storage, Honeycomb improves throughput by more than 2x. The cost-performance,
which is more important for large-scale deployments, is improved by at least
1.5x on these workloads
High Performance Computing using Infiniband-based clusters
L'abstract è presente nell'allegato / the abstract is in the attachmen
High-Performance In-Memory OLTP via Coroutine-to-Transaction
Data stalls are a major overhead in main-memory database engines due to the use of pointer-rich data structures. Lightweight coroutines ease the implementation of software prefetching to hide data stalls by overlapping computation and asynchronous data prefetching. Prior solutions, however, mainly focused on (1) individual components and operations and (2) intra-transaction batching that requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much end-to-end benefit they bring under various workloads. This thesis presents CoroBase, a main-memory database engine that tackles these challenges with a new coroutine-to-transaction paradigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefits of prefetching. We show that on a 48-core server, CoroBase can perform close to 2Ă— better for read-intensive workloads and remain competitive for workloads that inherently do not benefit from software prefetching
Database and System Design for Emerging Storage Technologies
Emerging storage technologies offer an alternative to disk that is durable and allows faster data access. Flash memory, made popular by mobile devices, provides block access with low latency random reads. New nonvolatile memories (NVRAM) are expected in upcoming years, presenting DRAM-like performance alongside persistent storage. Whereas both technologies accelerate data accesses due to increased raw speed, used merely as disk replacements they may fail to achieve their full potentials. Flash’s asymmetric read/write access (i.e., reads execute faster than writes opens new opportunities to optimize Flash-specific access. Similarly, NVRAM’s low latency persistent accesses allow new designs for high performance failure-resistant applications.
This dissertation addresses software and hardware system design for such storage technologies. First, I investigate analytics query optimization for Flash, expecting Flash’s fast random access to require new query planning. While intuition suggests scan and join selection should shift between disk and Flash, I find that query plans chosen assuming disk are already near-optimal for Flash. Second, I examine new opportunities for durable, recoverable transaction processing with NVRAM. Existing disk-based recovery mechanisms impose large software overheads, yet updating data in-place requires frequent device synchronization that limits throughput. I introduce a new design, NVRAM Group Commit, to amortize synchronization delays over many transactions, increasing throughput at some cost to transaction latency. Finally, I propose a new framework for persistent programming and memory systems to enable high performance recoverable data structures with NVRAM, extending memory consistency with persistent semantics to introduce memory persistency.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107114/1/spelley_1.pd