397 research outputs found
Scaling In-Memory databases on multicores
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead.
In this dissertation, we present a scalability study of two general purpose IMDBs on
multicore systems. The results show that current general purpose IMDBs do not scale
on multicores, due to contention among threads running concurrent transactions. In
this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in
multicores, while enforcing strong isolation semantics.
First, we present a solution that requires no modification to either database systems
or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine.
Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads.
Next we addressed the scalability limitations for update-intensive workloads, and
propose the reduction of locking granularity from the table level to the attribute level.
This further improved performance for intensive and moderate update workloads, at a
slight cost for read-only workloads. Scalability is limited to intensive-read and read-only
workloads.
Finally, we investigate the impact applications have on the performance of database
systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under
which transaction perform all read operations before executing write operations. The
RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation
Implementing Performance Competitive Logical Recovery
New hardware platforms, e.g. cloud, multi-core, etc., have led to a
reconsideration of database system architecture. Our Deuteronomy project
separates transactional functionality from data management functionality,
enabling a flexible response to exploiting new platforms. This separation
requires, however, that recovery is described logically. In this paper, we
extend current recovery methods to work in this logical setting. While this is
straightforward in principle, performance is an issue. We show how ARIES style
recovery optimizations can work for logical recovery where page information is
not captured on the log. In side-by-side performance experiments using a common
log, we compare logical recovery with a state-of-the art ARIES style recovery
implementation and show that logical redo performance can be competitive.Comment: VLDB201
A component-based collaboration infrastructure
Groupware applications allow geographically distributed users to collaborate
on shared tasks. However, it is widely recognized that groupware applications are
expensive to build due to coordination services and group dynamics, neither of which
is present in single-user applications. Previous collaboration transparency systems
reuse existing single-user applications as a whole for collaborative work, often at
the price of inflexible coordination. Previous collaboration awareness systems, on
the other hand, provide reusable coordination services and multi-user widgets, but
often with two weaknesses: (1) the multi-user widgets provided are special-purpose
and limited in number, while no guidelines are provided for developing multi-user
interface components in general; and (2) they often fail to reach the desired level of flexibility in coordination by tightly binding shared data and coordination services.
In this dissertation, we propose a component-based approach to developing group-
ware applications that addresses the above two problems. To address the first prob-
lem, we propose a shared component model for modeling data and graphic user inter-
face(GUI) components of groupware applications. As a result, the myriad of existing
single-user components can be re-purposed as shared GUI or data components. An
adaptation tool is developed to assist the adaptation process.
To address the second problem, we propose a coordination service framework
which systematically model the interaction between user, data, and coordination
protocols. Due to the clean separation of data and control and the capability to dynamically "glue" them together, the framework provides reusable services such as
data distribution, persistence, and adaptable consistency control. The association
between data and coordination services can be dynamically changed at runtime.
An Evolvable and eXtensible Environment for Collaboration (EXEC) is built to
evaluate the proposed approach. In our experiments, we demonstrate two benefits of
our approach: (1) a group of common groupware features adapted from existing single-
user components are plugged in to extend the functionalities of the environment itself;
and (2)coordination services can be dynamically attached to and detached from these
shared components at different granules to support evolving collaboration needs
Developing Collaborative XML Editing Systems
In many areas the eXtensible Mark-up Language (XML) is becoming the standard exchange and data format. More and more applications not only support XML as an exchange format but also use it as their data model or default file format for graphic, text and database (such as spreadsheet) applications. Computer Supported Cooperative Work is an interdisciplinary field of research dealing with group work, cooperation and their supporting information and communication technologies. One part of it is Real-Time Collaborative Editing, which investigates the design of systems which allow several persons to work simultaneously in real-time on the same document, without the risk of inconsistencies.
Existing collaborative editing research applications specialize in one or at best, only a small number of document types; for example graphic, text or spreadsheet documents. This research investigates the development of a software framework which allows collaborative editing of any XML document type in real-time. This presents a more versatile solution to the problems of real-time collaborative editing.
This research contributes a new software framework model which will assist software engineers in the development of new collaborative XML editing applications. The devised framework is flexible in the sense that it is easily adaptable to different workflow requirements covering concurrency control, awareness mechanisms and optional locking of document parts. Additionally this thesis contributes a new framework integration strategy that enables enhancements of existing single-user editing
applications with real-time collaborative editing features without changing their source code
Transactional Client-Server Cache Consistency: Alternatives and Performance
Client-server database systems based on a page server model can
exploit client memory resources by caching copies of pages across
transaction boundaries. Caching reduces the need to obtain data from
servers or other sites on the network. In order to ensure that such
caching does not result in the violation of transaction semantics, a cache
consistency maintenance algorithm is required. Many such algorithms have
been proposed in the literature and, as all provide the same
functionality, performance is a primary concern in choosing among them. In
this paper we provide a taxonomy that describes the design space for
transactional cache consistency maintenance algorithms and show how
proposed algorithms relate to one another. We then investigate the
performance of six of these algorithms, and use these results to examine
the tradeoffs inherent in the design choices identified in the taxonomy.
The insight gained in this manner is then used to reflect upon the
characteristics of other algorithms that have been proposed. The results
show that the interactions among dimensions of the design space can impact
performance in many ways, and that classifications of algorithms as simply
Pessimistic" or Optimistic" do not accurately characterize the
similarities and differences among the many possible cache consistency
algorithms.
(Also cross-referenced as UMIACS-TR-95-84
Ensuring Serializable Executions with Snapshot Isolation DBMS
Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBA’s to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called ’External Lock Manager’ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBA’s to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution
- …