28 research outputs found

    NoSQL Databases in Kubernetes

    Get PDF
    With the increasing popularity of deploying applications in containers, Kubernetes (K8s) has become one of the most accepted container orchestration systems. Kubernetes helps maintain containers smoothly and simplifies DevOps with powerful automations. It was originally developed as a tool to manage stateless microservices that run seamlessly in containers. The ephemeral nature of pods, the smallest deployable unit, in Kubernetes was well-aligned with stateless applications since destroying and recreating pods didn’t impact applications. There was a need to provision solutions around stateful workloads like databases so as to take advantage of K8s. This project explores this need, the challenges associated and the available solutions for running databases in Kubernetes. Most of the current research is focused towards SQL-like databases in K8s even though the DNA of NoSQL distributed databases is more aligned with K8s. With no research being done with NoSQL databases, this project outlines the process behind setting up two famous NoSQL databases in K8s: MongoDB and Cassandra. The project also shows a representative viewpoint of the performance comparison between them using the YCSB benchmark. The project lays a foundation around the setup of these databases using K8s Operators and their benchmarking. The goal of the project is to describe the advantages of having databases in K8s, provide developers a clear path for setup and provide insights on basic benchmark performance

    LenticularFS: Scalable filesystem for the cloud

    Get PDF
    The Hadoop platform is the most common solution to handle the explosion of big-data that both companies and research institutions are facing. In order to store such data, the Hadoop platform provides HDFS, a scalable distributed filesystem, which runs on commodity hardware and enables linear scalability by adding new storage nodes. While storage capacity of the system can be increased by adding new storage nodes, the component that handles metadata for the filesystem, the namenode, is a single point of failure and cannot easily replaced or linearly scaled. The Hops projects provides an alternative implementation of the namenode, which increases performance and scalability by storing metadata on an external distributed NewSQL database called MySQL Cluster. With the new architecture, the system is much more scalable and can transparently manage the failover of namenodes, which are now stateless components. HopsFS is, however, still limited to running within a single datacenter, which can cause severe outages in case the entire datacenter becomes unavailable. Cloud native storage systems, such as Amazon’s Simple Storage Service (S3), solve this problem by replicating data across different, geographically distant datacenters, so that the failure of any given zone does not cause data unavailability. The objective of this thesis is to enable HopsFS to work across geographical regions while, as far as possible, maintaining the semantics of a POSIX-style hierarchical filesystem. We leverage asynchronous replication functionality provided by MySQL Cluster to obtain replication of metadata across geographical regions and we present a detailed analysis on how to maintain the consistency properties of HDFS in such an environment. Furthermore, we analyze the issue of split brain scenarios and propose a way for namenodes to detect this condition and continue operating correctly. Finally, we discuss the changes to the codebase which are required to implement the proposed plan

    Extensão de propriedades SQL a SGBD NoSQL através de call level interfaces

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaOs arquitetos de software usam ferramentas, tais como Call Level Interfaces (CLI), para guardar, atualizar e retirar dados de Sistemas de Gestão de Bases de Dados (SGBD). Estas ferramentas estão desenhadas para efetuarem a junção entre os paradigmas de Base de Dados Relacional e da Programação Orientada a Objetos e fornecem funcionalidades padrão para interagir com SGBD. No entanto, a emergência do paradigma NoSQL, e particularmente de novos fornecedores de SGBD NoSQL, leva a situações onde algumas das funcionalidades padrão fornecidas por CLI não são suportadas. Isto deve-se normalmente à distância entre o modelo SQL e NoSQL, ou devido a restrições de design. Assim, quando um arquiteto de sistema precisa de evoluir, nomeadamente de um SGBD relacional para um SGBD NoSQL, tem de ultrapassar as dificuldades que emergem por existirem funcionalidades não suportadas pelo SGBD NoSQL. Não só isso, mas as CLI costumam ignorar políticas de controlo de acesso estabelecidas e, portanto, programadores de aplicações têm de dominar as ditas políticas de maneira a desenvolverem software em concordância com elas. Escolher o SGBD NoSQL errado pode levar a problemas de grandes dimensões quando as aplicações pedem funcionalidades não suportadas ou a que não têm acesso. Esta tese foca-se em implementar funcionalidades que não são comummente suportadas por SGBD NoSQL, tais como Stored Procedures, Transações, Save Points e interações com estruturas de memória local, através de uma framework baseada numa CLI padrão. O modelo de implementação de funcionalidades é definido por módulos da nossa framework, e permite a criação de sistemas distribuídos e tolerantes a falhas, que simulam as funcionalidades anteriormente referidas e abstraem as funcionalidades da base de dados subjacente de clientes. Também temos como objetivo integrar a nossa framework com trabalho anterior, a S-DRACA, uma arquitetura dinâmica e segura de controlo de acesso para aplicações relacionais, onde as permissões são definidas como sequências de expressões create, read, update e delete. Com esta integração, conseguimos fornecer Role-Based Access Control e outras funcionalidades de segurança a qualquer tipo de SGBD. Desenvolvemos várias formas de utilizar cada componente (localmente ou distribuído) e a framework está construída de forma modular, o que permite aos vários componentes serem utilizados individualmente ou em grupo, assim como permite o acrescento de funcionalidades ou SGBD adicionais por administradores de sistema que queiram adaptar a framework às suas necessidades particulares.To store, update and retrieve data from database management systems (DBMS), software architects use tools, like call level interfaces (CLI), which provide standard functionality to interact with DBMS. These tools are designed to bring together the relational database and object-oriented programming paradigms, but the emergence of the NoSQL paradigm, and particularly new NoSQL DBMS providers, leads to situations where some of the standard functionality provided by CLI are not supported, very often due to their distance from the relational model or due to design constraints. As such, when a system architect needs to evolve, namely from a relational DBMS to a NoSQL DBMS, he must overcome the difficulties conveyed by the features not provided by the NoSQL DBMS. Not only that, but CLI usually forsake applied access control policies. As such, application developers must master the established policies as a means to develop software that is conformant with them. Choosing the wrong NoSQL DBMS risks major issues with applications requesting non-supported features and with unauthorized accesses. This thesis focuses on deploying features that are not so commonly supported by NoSQL DBMS, such as Stored Procedures, Transactions, Save Points and interactions with local memory structures, through a framework based in a standard CLI. The feature implementation model is defined by modules of our framework, and allows for distributed and fault-tolerant systems to be deployed, which simulate the previously mentioned features and abstract the underlying database features from clients. It is also our goal to integrate our framework with previous work, S-DRACA, a dynamic secure access control architecture for relational applications, where permissions are defined as a sequence of create, read, update and delete expressions. With the integration, we can provide dynamic Role-Based Access Control and other security features to any kind of DBMS. We developed several ways of using each component (locally or distributed) and the framework is built in a modular fashion, which allows several components to be used individually or together, as well as extra features or DBMS to be added by system administrators that wish to adapt the framework to their particular needs

    Decentralized SDN Control Plane for a Distributed Cloud-Edge Infrastructure: A Survey

    Get PDF
    International audienceToday’s emerging needs (Internet of Things applications, Network Function Virtualization services, Mobile Edge computing, etc.) are challenging the classic approach of deploying a few large data centers to provide cloud services. A massively distributed Cloud-Edge architecture could better fit these new trends’ requirements and constraints by deploying on-demand infrastructure services in Point-of-Presences within backbone networks. In this context, a key feature is establishing connectivity among several resource managers in charge of operating, each one a subset of the infrastructure. After explaining the networking management challenges related to distributed Cloud-Edge infrastructures, this article surveys and analyzes the characteristics and limitations of existing technologies in the Software Defined Network field that could be used to provide the intersite connectivity feature. We also introduce Kubernetes, the new de facto container orchestrator platform, and analyze its use in the proposed context. This survey is concluded by providing a discussion about some research directions in the field of SDN applied to distributed Cloud-Edge infrastructures’ management

    Development of a cluster of LXC containers

    Get PDF
    The consensus algorithms were designed to reach agreements in distributed systems trying to maintain a certain tolerance to failures. An application derived from these algorithms is ETCD, a type of key-value storage that provides a safe and reliable way to save data shared in a network of servers, allowing the preser- vation of the information in spite of the possible fall of the involved nodes . My thesis shows the design and operation of these algorithms, and addresses the question of how to man- age the ETCD database and which features involves, as well as recommendations of configuration settings to achieve the best performance of the system. Finally, an academic practice is presented, aimed to network students that would want to have a deeper knowledge about consensus algorithms applications in a distributed system

    Development of a cluster of LXC containers

    Get PDF
    The consensus algorithms were designed to reach agreements in distributed systems trying to maintain a certain tolerance to failures. An application derived from these algorithms is ETCD, a type of key-value storage that provides a safe and reliable way to save data shared in a network of servers, allowing the preser- vation of the information in spite of the possible fall of the involved nodes . My thesis shows the design and operation of these algorithms, and addresses the question of how to man- age the ETCD database and which features involves, as well as recommendations of configuration settings to achieve the best performance of the system. Finally, an academic practice is presented, aimed to network students that would want to have a deeper knowledge about consensus algorithms applications in a distributed system

    Survey of NoSQL Database Engines for Big Data

    Get PDF
    Cloud computing is a paradigm shift that provides computing over Internet. With growing outreach of Internet in the lives of people, everyday large volume of data is generated from different sources such as cellphones, electronic gadgets, e-commerce transactions, social media, and sensors. Eventually, the size of generated data is so large that it is also referred as Big Data. Companies harvesting business opportunities in digital world need to invest their budget and time to scale their IT infrastructure for the expansion of their businesses. The traditional relational databases have limitations in scaling for large Internet scale distributed systems. To store rapidly expanding high volume Big Data efficiently, NoSQL data stores have been developed as an alternative solution to the relational databases. The purpose of this thesis is to provide a holistic overview of different NoSQL data stores. We cover different fundamental principles supporting the NoSQL data store development. Many NoSQL data stores have specific and exclusive features and properties. They also differ in their architecture, data model, storage system, and fault tolerance abilities. This thesis describes different aspects of few NoSQL data stores in detail. The thesis also covers the experiments to evaluate and compare the performance of different NoSQL data stores on a distributed cluster. In the scope of this thesis, HBase, Cassandra, MongoDB, and Riak are four NoSQL data stores selected for the benchmarking experiments
    corecore