14 research outputs found
Deferred lightweight indexing for log-structured key-value stores
The recent shift towards write-intensive workload on bigdata (e.g., financial trading, social user-generated data streams)has pushed the proliferation of log-structured key-value stores, represented by Google's BigTable [1], Apache HBase [2] andCassandra [3]. While providing key-based data access with aPut/Get interface, these key-value stores do not support value-based access methods, which significantly limits their applicability in modern web and database applications. In this paper, we present DELI, a DEferred Lightweight Indexing scheme on the log-structured key-value stores. To index intensively updated bigdata in real time, DELI aims at making the index maintenance as lightweight as possible. The key idea is to apply an append-only design for online index maintenance and to collect index garbage at carefully chosen time. DELI optimizes the performance of index garbage collection through tightly coupling its execution with a native routine process called compaction. The DELI'ssystem design is fault-tolerant and generic (to most key-valuestores), we implemented a prototype of DELI based on HBasewithout internal code modification. Our experiments show that the DELI offers significant performance advantage for the write-intensive index maintenance
Designee of a Scalable Database Management Systems (DBMS)
Scalable database management systems (DBMS)-both for update intensive application workloads as well as decision support systems for descriptive and deep analytics-are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management has been a vision for more than three decades and much research has focused on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment. This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses both classes of systems: (I) for supporting update heavy applications and (II) for ad-hoc analytics and decision support. We then focus on providing an in-depth analysis of systems for supporting update intensive web-applications and provide a survey of the state-of-the-art in this domain. We crystallize the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS
Recommended from our members
Easy Freshness with Pequod Cache Joins
Pequod is a distributed application-level key-value cache that supports declaratively defined, incrementally maintained, dynamic, partially-materialized views. These views, which we call cache joins, can simplify application development by shifting the burden of view maintenance onto the cache. Cache joins define relationships among key ranges; using cache joins, Pequod calculates views on demand, incrementally updates them as required, and in many cases improves performance by reducing client communication. To build Pequod, we had to design a view abstraction for volatile, relationless key-value caches and make it work across servers in a distributed system. Pequod performs as well as other inmemory key-value caches and, like those caches, outperforms databases with view support.Engineering and Applied Science
Write-Optimized Indexing for Log-Structured Key-Value Stores
The recent shift towards write-intensive workload on
big data (e.g., financial trading, social user-generated data
streams) has pushed the proliferation of the log-structured key-value stores, represented by Google’s BigTable, HBase
and Cassandra; these systems optimize write performance by
adopting a log-structured merge design. While providing key-based
access methods based on a Put/Get interface, these
key-value stores do not support value-based access methods,
which significantly limits their applicability in many web and Internet applications, such as real-time search for all tweets or blogs containing “government shutdown”. In this paper, we present HINDEX, a write-optimized indexing scheme on the log-structured key-value stores. To index intensively
updated big data in real time, the index maintenance is made
lightweight by a design tailored to the unique characteristic of the underlying log-structured key-value stores. Concretely, HINDEX performs append-only index updates, which avoids the reading of historic data versions, an expensive operation
in the log-structure store. To fix the potentially obsolete index entries, HINDEX proposes an offline index repair
process through tight coupling with the routine compactions. HINDEX’s system design is generic to the Put/Get interface;
we implemented a prototype of HINDEX based on HBase
without internal code modification. Our experiments show
that the HINDEX offers significant performance advantage for the write-intensive index maintenance
Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In:
Abstract Currently, users of geo-distributed storage systems face a hard choice between having serializable transactions with high latency, or limited or no transactions with low latency. We show that it is possible to obtain both serializable transactions and low latency, under two conditions. First, transactions are known ahead of time, permitting an a priori static analysis of conflicts. Second, transactions are structured as transaction chains consisting of a sequence of hops, each hop modifying data at one server. To demonstrate this idea, we built Lynx, a geo-distributed storage system that offers transaction chains, secondary indexes, materialized join views, and geo-replication. Lynx uses static analysis to determine if each hop can execute separately while preserving serializability-if so, a client needs wait only for the first hop to complete, which occurs quickly. To evaluate Lynx, we built three applications: an auction service, a Twitter-like microblogging site and a social networking site. These applications successfully use chains to achieve low latency operation and good throughput
Materialisierte views in verteilten key-value stores
Distributed key-value stores have become the solution of choice for warehousing large volumes of data. However, their architecture is not suitable for real-time analytics. To achieve the required velocity, materialized views can be used to provide summarized data for fast access. The main challenge then, is the incremental, consistent maintenance of views at large scale. Thus, we introduce our View Maintenance System (VMS) to maintain SQL queries in a data-intensive real-time scenario.Verteilte key-value stores sind ein Typ moderner Datenbanken um große Mengen an Daten zu verarbeiten. Trotzdem erlaubt ihre Architektur keine analytischen Abfragen in Echtzeit. Materialisierte Views können diesen Nachteil ausgleichen, indem sie schnellen Zuriff auf Ergebnisse ermöglichen. Die Herausforderung ist dann, das inkrementelle und konsistente Aktualisieren der Views. Daher präsentieren wir unser View Maintenance System (VMS), das datenintensive SQL Abfragen in Echtzeit berechnet