977 research outputs found

    Sharing Buffer Pool Memory in Multi-Tenant Relational Database-as-a-Service

    Get PDF
    ABSTRACT Relational database-as-a-service (DaaS) providers need to rely on multi-tenancy and resource sharing among tenants, since statically reserving resources for a tenant is not cost effective. A major consequence of resource sharing is that the performance of one tenant can be adversely affected by resource demands of other colocated tenants. One such resource that is essential for good performance of a tenant's workload is buffer pool memory. In this paper, we study the problem of how to effectively share buffer pool memory in multi-tenant relational DaaS. We first develop an SLA framework that defines and enforces accountability of the service provider to the tenant even when buffer pool memory is not statically reserved on behalf of the tenant. Next, we present a novel buffer pool page replacement algorithm (MT-LRU) that builds upon theoretical concepts from weighted online caching, and is designed for multi-tenant scenarios involving SLAs and overbooking. MT-LRU generalizes the LRU-K algorithm which is commonly used in relational database systems. We have prototyped our techniques inside a commercial DaaS engine and extensive experiments demonstrate the effectiveness of our solution

    DBaaS Multitenancy, Auto-tuning and SLA Maintenance in Cloud Environments: a Brief Survey

    Get PDF
    Cloud computing is a paradigm that presents many advantages to both costumers and service providers, such as low upfront investment, pay-per-use and easiness of use, delivering/enabling scalable services using Internet technologies. Among many types of services we have today, Database as a Service (DBaaS) is the one where a database is provided in the cloud in all its aspects. Examples of aspects related to DBaaS utilization are data storage, resources management and SLA maintenance. In this context, an important feature, related to it, is resource management and performance, which can be done in many different ways for several reasons, such as saving money, time, and meeting the requirements agreed between client and provider, that are defined in the Service Level Agreement (SLA). A SLA usually tries to protect the costumer from not receiving the contracted service and to ensure that the provider reaches the profit intended. In this paper it is presented a classification based on three main parameters that aim to manage resources for enhancing the performance on DBaaS and guarantee that the SLA is respected for both user and provider sides benefit. The proposal is based upon a survey of existing research work efforts

    Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning

    Full text link
    In this extended abstract, we propose a new technique for query scheduling with the explicit goal of reducing disk reads and thus implicitly increasing query performance. We introduce \system, a learned scheduler that leverages overlapping data reads among incoming queries and learns a scheduling strategy that improves cache hits. \system relies on deep reinforcement learning to produce workload-specific scheduling strategies that focus on long-term performance benefits while being adaptive to previously-unseen data access patterns. We present results from a proof-of-concept prototype, demonstrating that learned schedulers can offer significant performance improvements over hand-crafted scheduling heuristics. Ultimately, we make the case that this is a promising research direction in the intersection of machine learning and databases

    SLA-Based Performance Tuning Techniques for Cloud Databases

    Get PDF
    Today, cloud databases are widely used in many applications. The pay-per-use model of cloud databases enables on-demand access to reliable and configurable services (CPU, storage, networks, and software) that can be quickly provisioned and released with minimal management and cost for different categories of users (also called tenants). There is no need for users to set up the infrastructure or buy the software. Users without related technical background can easily manage the cloud database through the console provided by service providers, and they just need to pay to the cloud service provider only for the services they use through a service level agreement (SLA) that specifies the performance requirements and the pricing associated with the leased services. However, due to the resource sharing structure of the cloud, different tenants’ workloads compete for computing resource. This will affect tenants’ performance, especially during the workload peak time. So it is important for cloud database service providers to develop techniques that can tune the database in order to re-guarantee the SLA when a tenant’s SLA is violated. In this dissertation, two algorithms are presented in order to improve the cloud database’s performance in a multi-tenancy environment. The first algorithm is a memory buffer management algorithm called SLA-LRU and the second algorithm is a vertical database partitioning algorithm called AutoClustC. SLA-LRU takes SLA, buffer page’s frequency, buffer page’s recency, and buffer page’s value into account in order to perform buffer page replacement. The value of a buffer page represents the removal cost of this page and can be computed using the corresponding tenant’s SLA penalty function. Only the buffer pages whose tenants have the least SLA penalty cost increment will be considered by the SLA-LRU algorithm when a buffer page replacement action is taken place. AutoClustC estimates the tuning cost for resource provisioning and database partitioning, then selects the most cost saving tuning method to tune the database. If database partitioning is selected, the algorithm will use data mining to identify the database partitions accessed frequently together and will re-partition the database accordingly. The algorithm will then distribute the resulting partitions to the standby physical machines (PMs) that have the least overload score computed based on both the PMs’ communication cost and overload status. Comprehensive experiments were conducted in order to study the performance of SLA-LRU and AutoClustC using the TPC-H benchmark on both the public cloud (Amazon RDS) and private cloud. The experiment results show that SLA-LRU gives the best overall performance in terms of query response time and SLA penalty cost improvement ratio, compared to the existing memory buffer management algorithms; and AutoClustC is capable of identifying the most cost-saving cloud database tuning method with high accuracy from resource provisioning and database partitioning, and performing database re-partitioning dynamically to provide better query response time than the current partitioning configuration

    Resource Sharing for Multi-Tenant Nosql Data Store in Cloud

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads

    Workload-Aware Database Monitoring and Consolidation

    Get PDF
    In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple databases on fewer machines, reducing hardware and operational costs. Virtual machine (VM) technology is one popular way to approach this problem. However, as we demonstrate in this paper, VMs fail to adequately support database consolidation, because databases place a unique and challenging set of demands on hardware resources, which are not well-suited to the assumptions made by VM-based consolidation. Instead, our system for database consolidation, named Kairos, uses novel techniques to measure the hardware requirements of database workloads, as well as models to predict the combined resource utilization of those workloads. We formalize the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation. We compare Kairos against virtual machines, showing up to a factor of 12Ă— higher throughput on a TPC-C-like benchmark. We also tested the effectiveness of our approach on real-world data collected from production servers at Wikia.com, Wikipedia, Second Life, and MIT CSAIL, showing absolute consolidation ratios ranging between 5.5:1 and 17:1

    Memory-aware sizing for in-memory databases

    Get PDF
    In-memory database systems are among the technological drivers of big data processing. In this paper we apply analytical modeling to enable efficient sizing of in-memory databases. We present novel response time approximations under online analytical processing workloads to model thread-level forkjoin and per-class memory occupation.We combine these approximations with a non-linear optimization program to minimize memory swapping in in-memory database clusters. We compare our approach with state-of-the-art response time approximations and trace-driven simulation using real data from an SAP HANA in-memory system and show that our optimization model is significantly more accurate than existing approaches at similar computational costs

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed
    • …
    corecore