81 research outputs found

    Resource Sharing for Multi-Tenant Nosql Data Store in Cloud

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads

    Cloud Multi-Tenancy: Issues and Developments

    Get PDF
    Cloud Computing (CC) is a computational paradigm that provides pay-per use services to customers from a pool of networked computing resources that are provided on demand. Customers therefore does not need to worry about infrastructure or storage. Cloud Service Providers (CSP) make custom built applications available to customers online. Also, organisations and enterprises can build and deploy applications based on platforms provided by the Cloud service provider. Scalable storage and computing resources is also made available to consumers on the Clouds at a cost. Cloud Computing takes virtualization a step further through the use of virtual machines, it allows several customers share the same physical machine. In addition, it is possible for numerous customers to share applications provided by a CSP; this sharing model is known as multi-tenancy. Though Multi-tenancy has its drawbacks but however, it is highly desirable based on its cost efficiency. This paper presents the comprehensive study of existing literatures on relevant issues and development relating to cloud multitenancy using reliable methods. This study examines recent trends in the area of cloud multi-tenancy and provides a guide for future research. The analyses of this comprehensive study was based on the following questions relating to recent study in multi-tenancy which are: what is the current trend and development in cloud multi-tenancy? Existing publications were analyzed in this area including journals, conferences, white papers and publications in reputable magazines. The expected result at the end of this review is the identification of trends in cloud multi-tenancy. This will be of benefit to prospective cloud users and even cloud providers

    DBaaS Multitenancy, Auto-tuning and SLA Maintenance in Cloud Environments: a Brief Survey

    Get PDF
    Cloud computing is a paradigm that presents many advantages to both costumers and service providers, such as low upfront investment, pay-per-use and easiness of use, delivering/enabling scalable services using Internet technologies. Among many types of services we have today, Database as a Service (DBaaS) is the one where a database is provided in the cloud in all its aspects. Examples of aspects related to DBaaS utilization are data storage, resources management and SLA maintenance. In this context, an important feature, related to it, is resource management and performance, which can be done in many different ways for several reasons, such as saving money, time, and meeting the requirements agreed between client and provider, that are defined in the Service Level Agreement (SLA). A SLA usually tries to protect the costumer from not receiving the contracted service and to ensure that the provider reaches the profit intended. In this paper it is presented a classification based on three main parameters that aim to manage resources for enhancing the performance on DBaaS and guarantee that the SLA is respected for both user and provider sides benefit. The proposal is based upon a survey of existing research work efforts

    Efficient data reconfiguration for today's cloud systems

    Get PDF
    Performance of big data systems largely relies on efficient data reconfiguration techniques. Data reconfiguration operations deal with changing configuration parameters that affect data layout in a system. They could be user-initiated like changing shard key, block size in NoSQL databases, or system-initiated like changing replication in distributed interactive analytics engine. Current data reconfiguration schemes are heuristics at best and often do not scale well as data volume grows. As a result, system performance suffers. In this thesis, we show that {\it data reconfiguration mechanisms can be done in the background by using new optimal or near-optimal algorithms coupling them with performant system designs}. We explore four different data reconfiguration operations affecting three popular types of systems -- storage, real-time analytics and batch analytics. In NoSQL databases (storage), we explore new strategies for changing table-level configuration and for compaction as they improve read/write latencies. In distributed interactive analytics engines, a good replication algorithm can save costs by judiciously using memory that is sufficient to provide the highest throughput and low latency for queries. Finally, in batch processing systems, we explore prefetching and caching strategies that can improve the number of production jobs meeting their SLOs. All these operations happen in the background without affecting the fast path. Our contributions in each of the problems are two-fold -- 1) we model the problem and design algorithms inspired from well-known theoretical abstractions, 2) we design and build a system on top of popular open source systems used in companies today. Finally, using real-life workloads, we evaluate the efficacy of our solutions. Morphus and Parqua provide several 9s of availability while changing table level configuration parameters in databases. By halving memory usage in distributed interactive analytics engine, Getafix reduces cost of deploying the system by 10 million dollars annually and improves query throughput. We are the first to model the problem of compaction and provide formal bounds on their runtime. Finally, NetCachier helps 30\% more production jobs to meet their SLOs compared to existing state-of-the-art

    NoSQL Database Modeling and Management: A Systematic Literature Review

    Get PDF
    The NoSQL databases that emerged this century were created to solve the limitations of relational database systems due to the different types of data that have appeared for information processing. In this paper, we present the results of a secondary study carried out to find and synthesize the research made up to now on modeling processes, characteristics of the used types of data, and management tools for NoSQL Databases. Currently, four types are recognized and classified according to the data model they use: key-value, document-oriented, column-based, and graph-based. With this study, it was possible to identify that the most frequently type of NoSQL database model is that of documents because it offers greater flexibility and versatility compared to the other three models. Although it offers more complex search methods, in terms of data, column and document schemas are the ones that usually describe their characteristics. It was also possible to observe a trend in the use of the column-oriented model and the document-oriented model in the management tools, and, although they all comply with the basic functionalities, the differences lie in the way in which the information is stored and the way they can be accessed

    Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

    Get PDF
    This chapter presents software architectures of the big data processing platforms. It will provide an in-depth knowledge on resource management techniques involved while deploying big data processing systems on cloud environment. It starts from the very basics and gradually introduce the core components of resource management which we have divided in multiple layers. It covers the state-of-art practices and researches done in SLA-based resource management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure

    Energy-efficient cloud computing application solutions and architectures

    Get PDF
    Environmental issues are receiving unprecedented attention from business and governments around the world. As concern for greenhouse, climate change and sustainability continue to grow; businesses are grappling with improving their environmental impacts while remaining profitable. Many businesses have discovered that Green IT initiatives and strategies can reform the organization, comply with laws and regulations, enhance the public appearance of the organization, save energy cost, and improving their environmental impacts. One of these Green IT initiatives is migrating or building the business applications in the cloud. Cloud computing is a highly scalable and cost-effective infrastructure for running enterprise and web applications. As a result, building enterprise systems on cloud computing platform is increasing significantly today. However, cloud computing is not inherently proposing energy efficiency solutions for these businesses. In this thesis, a concept has been developed to support organizations choosing suitable energy-efficient cloud architecture while moving their application to the cloud or building new cloud applications. Thus, the concept focuses on how to employ the cloud computing technology as an energy efficient solution from the application perspective. The main idea applied in the concept is identifying architectures for cloud applications depending on the inherent properties of cloud computing such as virtualization and the elasticity that can make them green potential, and identifying correlations between these architectures with already identified business process patterns used in green business process design. Alongside with these correlations, the application has been decomposed into basic technical and business attributes that can describe the application. The relations between these attributes and the cloud architectures have been defined. The relations between the different components the application attributes, application architectures, and the green patterns can lead to not only the energy-efficient cloud architecture for the business application, but also to the architectures that can achieve the organization technical and business requirements. Prototypically, a recommender system has been implemented that supports the identification of suitable energy-efficient cloud application architectures in addition to the cloud migration decision
    corecore