4,103 research outputs found

    LogBase: A Scalable Log-structured Database System in the Cloud

    Full text link
    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.Comment: VLDB201

    Predicting Intermediate Storage Performance for Workflow Applications

    Full text link
    Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the turn-around time of an application using storage system under a given configuration. This approach focuses on a generic object-based storage system design, supports exploring the impact of optimizations targeting workflow applications (e.g., various data placement schemes) in addition to other, more traditional, configuration knobs (e.g., stripe size or replication level), and models the system operation at data-chunk and control message level. This paper presents our experience to date with designing and using this prediction mechanism. We evaluate this mechanism using micro- as well as synthetic benchmarks mimicking real workflow applications, and a real application.. A preliminary evaluation shows that we are on a good track to meet our objectives: it can scale to model a workflow application run on an entire cluster while offering an over 200x speedup factor (normalized by resource) compared to running the actual application, and can achieve, in the limited number of scenarios we study, a prediction accuracy that enables identifying the best storage system configuration

    Scalable Reliable SD Erlang Design

    Get PDF
    This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design

    DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores

    Full text link
    We present our framework DKVF that enables one to quickly prototype and evaluate new protocols for key-value stores and compare them with existing protocols based on selected benchmarks. Due to limitations of CAP theorem, new protocols must be developed that achieve the desired trade-off between consistency and availability for the given application at hand. Hence, both academic and industrial communities focus on developing new protocols that identify a different (and hopefully better in one or more aspect) point on this trade-off curve. While these protocols are often based on a simple intuition, evaluating them to ensure that they indeed provide increased availability, consistency, or performance is a tedious task. Our framework, DKVF, enables one to quickly prototype a new protocol as well as identify how it performs compared to existing protocols for pre-specified benchmarks. Our framework relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We demonstrate DKVF by implementing four existing protocols --eventual consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the performance of these protocols against different loading conditions. We find that the performance is similar to our implementation of these protocols from scratch. And, the comparison of these protocols is consistent with what has been reported in the literature. Moreover, implementation of these protocols was much more natural as we only needed to translate the pseudocode into Java (and add the necessary error handling). Hence, it was possible to achieve this in just 1-2 days per protocol. Finally, our framework is extensible. It is possible to replace individual components in the framework (e.g., the storage component)
    corecore