1,103 research outputs found
Recommended from our members
Quaestor: Query web caching for database-as-a-service providers
Today, web performance is primarily governed by round-trip latencies between end devices and cloud services. To improve performance, services need to minimize the delay of accessing data. In this paper, we propose a novel approach to low latency that relies on existing content delivery and web caching infrastructure. The main idea is to enable application-independent caching of query results and records with tunable consistency guarantees, in particular bounded staleness. Q
uaestor
(Query Store) employs two key concepts to incorporate both expiration-based and invalidation-based web caches: (1) an Expiring Bloom Filter data structure to indicate potentially stale data, and (2) statistically derived cache expiration times to maximize cache hit rates. Through a distributed query invalidation pipeline, changes to cached query results are detected in real-time. The proposed caching algorithms offer a new means for data-centric cloud services to trade latency against staleness bounds, e.g. in a database-as-a-service. Q
uaestor
is the core technology of the backend-as-a-service platform Baqend, a cloud service for low-latency websites. We provide empirical evidence for Q
uaestor
's scalability and performance through both simulation and experiments. The results indicate that for read-heavy workloads, up to tenfold speed-ups can be achieved through Q
uaestor
's caching.
</jats:p
Cache craftiness for fast multicore key-value storage
We present Masstree, a fast key-value database designed for SMP machines. Masstree keeps all data in memory. Its main data structure is a trie-like concatenation of B+-trees, each of which handles a fixed-length slice of a variable-length key. This structure effectively handles arbitrary-length possiblybinary keys, including keys with long shared prefixes. [superscript +]-tree fanout was chosen to minimize total DRAM delay when descending the tree and prefetching each tree node. Lookups use optimistic concurrency control, a read-copy-update-like technique, and do not write shared data structures; updates lock only affected nodes. Logging and checkpointing provide consistency and durability. Though some of these ideas appear elsewhere, Masstree is the first to combine them. We discuss design variants and their consequences.
On a 16-core machine, with logging enabled and queries arriving over a network, Masstree executes more than six million simple queries per second. This performance is comparable to that of memcached, a non-persistent hash table server, and higher (often much higher) than that of VoltDB, MongoDB, and Redis.National Science Foundation (U.S.). (Award 0834415)National Science Foundation (U.S.). (Award 0915164)Quanta Computer (Firm
Resource Sharing for Multi-Tenant Nosql Data Store in Cloud
Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads
Scaling In-Memory databases on multicores
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead.
In this dissertation, we present a scalability study of two general purpose IMDBs on
multicore systems. The results show that current general purpose IMDBs do not scale
on multicores, due to contention among threads running concurrent transactions. In
this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in
multicores, while enforcing strong isolation semantics.
First, we present a solution that requires no modification to either database systems
or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine.
Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads.
Next we addressed the scalability limitations for update-intensive workloads, and
propose the reduction of locking granularity from the table level to the attribute level.
This further improved performance for intensive and moderate update workloads, at a
slight cost for read-only workloads. Scalability is limited to intensive-read and read-only
workloads.
Finally, we investigate the impact applications have on the performance of database
systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under
which transaction perform all read operations before executing write operations. The
RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation
Management of object-oriented action-based distributed programs
Phd ThesisThis thesis addresses the problem of managing the runtime behaviour of distributed
programs. The thesis of this work is that management is fundamentally
an information processing activity and that the object model, as applied to actionbased
distributed systems and database systems, is an appropriate representation
of the management information. In this approach, the basic concepts of classes,
objects, relationships, and atomic transition systems are used to form object
models of distributed programs. Distributed programs are collections of objects
whose methods are structured using atomic actions, i.e., atomic transactions.
Object models are formed of two submodels, each representing a fundamental
aspect of a distributed program. The structural submodel represents a static
perspective of the distributed program, and the control submodel represents a
dynamic perspective of it. Structural models represent the program's objects,
classes and their relationships. Control models represent the program's object
states, events, guards and actions-a transition system. Resolution of queries on
the distributed program's object model enable the management system to control
certain activities of distributed programs.
At a different level of abstraction, the distributed program can be seen as a
reactive system where two subprograms interact: an application program and a
management program; they interact only through sensors and actuators. Sensors
are methods used to probe an object's state and actuators are methods used
to change an object's state. The management program is capable to prod the
application program into action by activating sensors and actuators available at
the interface of the application program. Actions are determined by management
policies that are encoded in the management program. This way of structuring
the management system encourages a clear modularization of application and
management distributed programs, allowing better separation of concerns. Managemental
concerns can be dealt with by the management program, functional
concerns can be assigned to the application program.
The object-oriented action-based computational model adopted by the management
system provides a natural framework for the implementation of faulttolerant
distributed programs. Object orientation provides modularity and extensibility
through object encapsulation. Atomic actions guarantee the consistency of
the objects of the distributed program despite concurrency and failures. Replication
of the distributed program provides increased fault-tolerance by guaranteeing
the consistent progress of the computation, even though some of the replicated
objects can fail.
A prototype management system based on the management theory proposed
above has been implemented atop Arjuna; an object-oriented programming system
which provides a set of tools for constructing fault-tolerant distributed programs. The management system is composed of two subsystems: Stabilis, a
management system for structural information, and Vigil, a management system
for control information. Example applications have been implemented to illustrate
the use of the management system and gather experimental evidence to give
support to the thesis.CNPq (Consellho Nacional de Desenvolvimento Cientifico e Tecnol6gico, Brazil):
BROADCAST (Basic Research On Advanced Distributed Computing: from Algorithms to SysTems)
Data management in cloud environments: NoSQL and NewSQL data stores
: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages
- …