177 research outputs found
On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems
A new emerging class of parallel database management systems (DBMS) is
designed to take advantage of the partitionable workloads of on-line
transaction processing (OLTP) applications. Transactions in these systems are
optimized to execute to completion on a single node in a shared-nothing cluster
without needing to coordinate with other nodes or use expensive concurrency
control measures. But some OLTP applications cannot be partitioned such that
all of their transactions execute within a single-partition in this manner.
These distributed transactions access data not stored within their local
partitions and subsequently require more heavy-weight concurrency control
protocols. Further difficulties arise when the transaction's execution
properties, such as the number of partitions it may need to access or whether
it will abort, are not known beforehand. The DBMS could mitigate these
performance issues if it is provided with additional information about
transactions. Thus, in this paper we present a Markov model-based approach for
automatically selecting which optimizations a DBMS could use, namely (1) more
efficient concurrency control schemes, (2) intelligent scheduling, (3) reduced
undo logging, and (4) speculative execution. To evaluate our techniques, we
implemented our models and integrated them into a parallel, main-memory OLTP
DBMS to show that we can improve the performance of applications with diverse
workloads.Comment: VLDB201
Realizing the Technical Advantages of Star Transformation
Data warehousing and business intelligence go hand in hand, each gives the other purpose for development, maintenance and improvement. Both have evolved over a few decades and build upon initial development. Management initiatives further drive the need and complexity of business intelligence, while in turn expanding the end user community so that business change, results and strategy are affected at the business unit level. The literature, including a recent business intelligence user survey, demonstrates that query performance is the most significant issue encountered. Oracle\u27s data warehouse 10g.2 is examined with improvements to query optimization via best practice through Star Transformation. Star Transformation is a star schema query rewrite and join back through a hash join, which provides extensive query performance improvement. Most data warehouses exist as normalized or in 3rd normal form (3NF), while star schemas in a denormalized warehouse are not the norm . Changes in the database environment must be implemented, along with agreement from business leadership and alignment of business objectives with a Star Transformation project. Often, so much change, shifting priorities and lack of understanding about query optimization benefits can stifle a project. Critical to the success of gaining support and financial backing is the official plan and demonstration of return on investment documentation. Query optimization is highly complex. Both the technological and business entities should prioritize goals and consider the benefits of improved query response time, realizing the technical advantages of Star Transformation
The End of a Myth: Distributed Transactions Can Scale
The common wisdom is that distributed transactions do not scale. But what if
distributed transactions could be made scalable using the next generation of
networks and a redesign of distributed databases? There would be no need for
developers anymore to worry about co-partitioning schemes to achieve decent
performance. Application development would become easier as data placement
would no longer determine how scalable an application is. Hardware provisioning
would be simplified as the system administrator can expect a linear scale-out
when adding more machines rather than some complex sub-linear function, which
is highly application specific.
In this paper, we present the design of our novel scalable database system
NAM-DB and show that distributed transactions with the very common Snapshot
Isolation guarantee can indeed scale using the next generation of RDMA-enabled
network technology without any inherent bottlenecks. Our experiments with the
TPC-C benchmark show that our system scales linearly to over 6.5 million
new-order (14.5 million total) distributed transactions per second on 56
machines.Comment: 12 page
Workload-Aware Performance Tuning for Autonomous DBMSs
Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. There is no one-size-fits-all configuration that works for different workloads since each workload has varying patterns with different resource requirements. There is a relationship between configuration, workload, and system performance. If a configuration cannot adapt to the dynamic changes of a workload, there could be a significant degradation in the overall performance of DBMS unless a sophisticated administrator is continuously re-configuring the DBMS. In this tutorial, we focus on autonomous workload-aware performance tuning, which is expected to automatically and continuously tune the configuration as the workload changes. We survey three research directions, including 1) workload classification, 2) workload forecasting, and 3) workload-based tuning. While the first two topics address the issue of obtaining accurate workload information, the third one tackles the problem of how to properly use the workload information to optimize performance. We also identify research challenges and open problems, and give real-world examples about leveraging workload information for database tuning in commercial products (e.g., Amazon Redshift). We will demonstrate workload-aware performance tuning in Amazon Redshift in the presentation.Peer reviewe
OLTP on Hardware Islands
Modern hardware is abundantly parallel and increasingly heterogeneous. The
numerous processing cores have non-uniform access latencies to the main memory
and to the processor caches, which causes variability in the communication
costs. Unfortunately, database systems mostly assume that all processing cores
are the same and that microarchitecture differences are not significant enough
to appear in critical database execution paths. As we demonstrate in this
paper, however, hardware heterogeneity does appear in the critical path and
conventional database architectures achieve suboptimal and even worse,
unpredictable performance. We perform a detailed performance analysis of OLTP
deployments in servers with multiple cores per CPU (multicore) and multiple
CPUs per server (multisocket). We compare different database deployment
strategies where we vary the number and size of independent database instances
running on a single server, from a single shared-everything instance to
fine-grained shared-nothing configurations. We quantify the impact of
non-uniform hardware on various deployments by (a) examining how efficiently
each deployment uses the available hardware resources and (b) measuring the
impact of distributed transactions and skewed requests on different workloads.
Finally, we argue in favor of shared-nothing deployments that are topology- and
workload-aware and take advantage of fast on-chip communication between islands
of cores on the same socket.Comment: VLDB201
The Data Lakehouse: Data Warehousing and More
Relational Database Management Systems designed for Online Analytical
Processing (RDBMS-OLAP) have been foundational to democratizing data and
enabling analytical use cases such as business intelligence and reporting for
many years. However, RDBMS-OLAP systems present some well-known challenges.
They are primarily optimized only for relational workloads, lead to
proliferation of data copies which can become unmanageable, and since the data
is stored in proprietary formats, it can lead to vendor lock-in, restricting
access to engines, tools, and capabilities beyond what the vendor offers. As
the demand for data-driven decision making surges, the need for a more robust
data architecture to address these challenges becomes ever more critical. Cloud
data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but
they present their own set of challenges. More recently, organizations have
often followed a two-tier architectural approach to take advantage of both
these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems.
However, this approach brings additional challenges, complexities, and
overhead. This paper discusses how a data lakehouse, a new architectural
approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake
combined, while also providing additional advantages. We take today's data
warehousing and break it down into implementation independent components,
capabilities, and practices. We then take these aspects and show how a
lakehouse architecture satisfies them. Then, we go a step further and discuss
what additional capabilities and benefits a lakehouse architecture provides
over an RDBMS-OLAP
Self-adaptation via concurrent multi-action evaluation for unknown context
Context-aware computing has been attracting growing attention in recent years. Generally, there are several ways for a context-aware system to select a course of action for a particular change of context. One way is for the system developers to encompass all possible context changes in the domain knowledge. Other methods include system inferences and adaptive learning whereby the system executes one action and evaluates the outcome and self-adapts/self-learns based on that. However, in situations where a system encounters unknown contexts, the iterative approach would become unfeasible when the size of the action space increases. Providing efficient solutions to this problem has been the main goal of this research project.
Based on the developed abstract model, the designed methodology replaces the single action implementation and evaluation by multiple actions implemented and evaluated concurrently. This parallel evaluation of actions speeds up significantly the evolution time taken to select the best action suited to unknown context compared to the iterative approach.
The designed and implemented framework efficiently carries out concurrent multi-action evaluation when an unknown context is encountered and finds the best course of action. Two concrete implementations of the framework were carried out demonstrating the usability and adaptability of the framework across multiple domains.
The first implementation was in the domain of database performance tuning. The concrete implementation of the framework demonstrated the ability of concurrent multi-action evaluation technique to performance tune a database when performance is regressed for an unknown reason.
The second implementation demonstrated the ability of the framework to correctly determine the threshold price to be used in a name-your-own-price channel when an unknown context is encountered.
In conclusion the research introduced a new paradigm of a self-adaptation technique for context-aware application. Among the existing body of work, the concurrent multi-action evaluation is classified under the abstract concept of experiment-based self-adaptation techniques
High Throughput Push Based Storage Manager
The storage manager, as a key component of the database system, is
responsible for organizing, reading, and delivering data to the execution
engine for processing. According to the data serving mechanism, existing
storage managers are either pull-based, incurring high latency, or push-based,
leading to a high number of I/O requests when the CPU is busy. To improve these
shortcomings, this thesis proposes a push-based prefetching strategy in a
column-wise storage manager. The proposed strategy implements an efficient
cache layer to store shared data among queries to reduce the number of I/O
requests. The capacity of the cache is maintained by a time access-aware
eviction mechanism. Our strategy enables the storage manager to coordinate
multiple queries by merging their requests and dynamically generate an optimal
read order that maximizes the overall I/O throughput. We evaluated our storage
manager both over a disk-based redundant array of independent disks (RAID) and
an NVM Express (NVMe) solid-state drive (SSD). With the high read performance
of the SSD, we successfully minimized the total read time and number of I/O
accesses
- …