183 research outputs found

    Benchmarking MongoDB multi-document transactions in a sharded cluster

    Get PDF
    Relational databases like Oracle, MySQL, and Microsoft SQL Server offer trans- action processing as an integral part of their design. These databases have been a primary choice among developers for business-critical workloads that need the highest form of consistency. On the other hand, the distributed nature of NoSQL databases makes them suitable for scenarios needing scalability, faster data access, and flexible schema design. Recent developments in the NoSQL database community show that NoSQL databases have started to incorporate transactions in their drivers to let users work on business-critical scenarios without compromising the power of distributed NoSQL features [1]. MongoDB is a leading document store that has supported single document atomicity since its first version. Sharding is the key technique to support the horizontal scalability in MongoDB. The latest version MongoDB 4.2 enables multi-document transactions to run on sharded clusters, seeking both scalability and ACID multi- documents. Transaction processing is a novel feature in MongoDB, and benchmarking the performance of MongoDB multi-document transactions in sharded clusters can encourage developers to use ACID transactions for business-critical workloads. We have adapted pytpcc framework to conduct a series of benchmarking experi- ments aiming at finding the impact of tunable consistency, database size, and design choices on the multi-document transaction in MongoDB sharded clusters. We have used TPC’s OLTP workload under a variety of experimental settings to measure business throughput. To the best of our understanding, this is the first attempt towards benchmarking MongoDB multi-document transactions in a sharded cluster

    A decentralized framework for cross administrative domain data sharing

    Get PDF
    Federation of messaging and storage platforms located in remote datacenters is an essential functionality to share data among geographically distributed platforms. When systems are administered by the same owner data replication reduces data access latency bringing data closer to applications and enables fault tolerance to face disaster recovery of an entire location. When storage platforms are administered by different owners data replication across different administrative domains is essential for enterprise application data integration. Contents and services managed by different software platforms need to be integrated to provide richer contents and services. Clients may need to share subsets of data in order to enable collaborative analysis and service integration. Platforms usually include proprietary federation functionalities and specific APIs to let external software and platforms access their internal data. These different techniques may not be applicable to all environments and networks due to security and technological restrictions. Moreover the federation of dispersed nodes under a decentralized administration scheme is still a research issue. This thesis is a contribution along this research direction as it introduces and describes a framework, called \u201cWideGroups\u201d, directed towards the creation and the management of an automatic federation and integration of widely dispersed platform nodes. It is based on groups to exchange messages among distributed applications located in different remote datacenters. Groups are created and managed using client side programmatic configuration without touching servers. WideGroups enables the extension of the software platform services to nodes belonging to different administrative domains in a wide area network environment. It lets different nodes form ad-hoc overlay networks on-the-fly depending on message destinations located in distinct administrative domains. It supports multiple dynamic overlay networks based on message groups, dynamic discovery of nodes and automatic setup of overlay networks among nodes with no server-side configuration. I designed and implemented platform connectors to integrate the framework as the federation module of Message Oriented Middleware and Key Value Store platforms, which are among the most widespread paradigms supporting data sharing in distributed systems

    Secure storage systems for untrusted cloud environments

    Get PDF
    The cloud has become established for applications that need to be scalable and highly available. However, moving data to data centers owned and operated by a third party, i.e., the cloud provider, raises security concerns because a cloud provider could easily access and manipulate the data or program flow, preventing the cloud from being used for certain applications, like medical or financial. Hardware vendors are addressing these concerns by developing Trusted Execution Environments (TEEs) that make the CPU state and parts of memory inaccessible from the host software. While TEEs protect the current execution state, they do not provide security guarantees for data which does not fit nor reside in the protected memory area, like network and persistent storage. In this work, we aim to address TEEs’ limitations in three different ways, first we provide the trust of TEEs to persistent storage, second we extend the trust to multiple nodes in a network, and third we propose a compiler-based solution for accessing heterogeneous memory regions. More specifically, • SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER implements a key-value interface. Its design is based on LSM data structures, but extends them to provide confidentiality, integrity, and freshness for the stored data. Thus, SPEICHER can prove to the client that the data has not been tampered with by an attacker. • AVOCADO is a distributed in-memory key-value store (KVS) that extends the trust that TEEs provide across the network to multiple nodes, allowing KVSs to scale beyond the boundaries of a single node. On each node, AVOCADO carefully divides data between trusted memory and untrusted host memory, to maximize the amount of data that can be stored on each node. AVOCADO leverages the fact that we can model network attacks as crash-faults to trust other nodes with a hardened ABD replication protocol. • TOAST is based on the observation that modern high-performance systems often use several different heterogeneous memory regions that are not easily distinguishable by the programmer. The number of regions is increased by the fact that TEEs divide memory into trusted and untrusted regions. TOAST is a compiler-based approach to unify access to different heterogeneous memory regions and provides programmability and portability. TOAST uses a load/store interface to abstract most library interfaces for different memory regions

    Control Plane in Software Defined Networks and Stateful Data Planes

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Invalidation-based protocols for replicated datastores

    Get PDF
    Distributed in-memory datastores underpin cloud applications that run within a datacenter and demand high performance, strong consistency, and availability. A key feature of datastores is data replication. The data are replicated across servers because a single server often cannot handle the request load. Replication is also necessary to guarantee that a server or link failure does not render a portion of the dataset inaccessible. A replication protocol is responsible for ensuring strong consistency between the replicas of a datastore, even when faults occur, by determining the actions necessary to access and manipulate the data. Consequently, a replication protocol also drives the datastore's performance. Existing strongly consistent replication protocols deliver fault tolerance but fall short in terms of performance. Meanwhile, the opposite occurs in the world of multiprocessors, where data are replicated across the private caches of different cores. The multiprocessor regime uses invalidations to afford strongly consistent replication with high performance but neglects fault tolerance. Although handling failures in the datacenter is critical for data availability, we observe that the common operation is fault-free and far exceeds the operation during faults. In other words, the common operating environment inside a datacenter closely resembles that of a multiprocessor. Based on this insight, we draw inspiration from the multiprocessor for high-performance, strongly consistent replication in the datacenter. The primary contribution of this thesis is in adapting invalidating protocols to the nuances of replicated datastores, which include skewed data accesses, fault tolerance, and distributed transactions

    Explorer l’hétérogénéité dans la réplication de données décentralisées faiblement cohérentes

    Get PDF
    Decentralized systems are scalable by design but also difficult to coordinate due to their weak coupling. Replicating data in these geo-distributed systems is therefore a challenge inherent to their structure. The two contributions of this thesis exploit the heterogeneity of user requirements and enable personalizable quality of services for data replication in decentralized systems. Our first contribution Gossip Primary-Secondary enables the consistency criterion Update consistency Primary-Secondary to offer differentiated guarantees in terms of consistency and message delivery latency for large-scale data replication. Our second contribution Dietcoin enriches Bitcoin with diet nodes that can (i) verify the correctness of entire subchains of blocks while avoiding the exorbitant cost of bootstrap verification and (ii) personalize their own security and resource consumption guarantees.Les systèmes décentralisés sont par nature extensibles mais sont également difficiles à coordonner en raison de leur faible couplage. La réplication de données dans ces systèmes géo-répartis est donc un défi inhérent à leur structure. Les deux contributions de cette thèse exploitent l'hétérogénéité des besoins des utilisateurs et permettent une qualité de service personnalisable pour la réplication de données dans les systèmes décentralisés. Notre première contribution Gossip Primary-Secondary étend le critère de cohérence Update consistency Primary-Secondary afin d'offrir des garanties différenciées de cohérence et de latence de messages pour la réplication de données à grande échelle. Notre seconde contribution Dietcoin enrichit Bitcoin avec des nœuds diet qui peuvent (i) vérifier la validité de sous-chaînes de blocs en évitant le coût exorbitant de la vérification initiale et (ii) choisir leur propres garanties de sécurité et de consommation de ressources

    Smart-contract Blockchain with Secure Hardware

    Get PDF
    In recent years, blockchains have grown in popularity and the main reason for this growth is the set of properties that they provide, such as user privacy and a public record of transactions. This popularity is verifiable by the number of cryptocurrencies currently available and by the current market value of Bitcoin currency. Since its introduction, blockchain has evolved and another concept closely linked with it is smart-contract, which allows for more complex operations over the blockchain than simple transactions. Nevertheless, blockchain technologies have significant problems that prevent it to be adopted as a mainstream solution, or at least as an alternative to centralized solutions such as banking systems. The main one is its inefficiency, which is due to the need of a consensus algorithm that provides total order of transactions. Traditional systems easily solve this by having a single central entity that orders transactions, which can’t be done in decentralized systems. Thus, blockchain’s efficiency and scalability suffer from the need of time-costly consensus algorithms, which means that they can’t currently compete with centralized systems that provide a much greater amount of transactional processing power. However, with the emergence of novel processor architectures, secure hardware and trusted computing technologies (e.g. Intel SGX and ARM TrustZone), it became possible to investigate new ways of improving the inefficiency issues of blockchain systems, by designing better and improved blockchains. With all this in mind, this dissertation aims to build an efficient blockchain system that leverages trusted technologies, namely the Intel SGX. Also, a previous thesis will serve as a starting point, since it already implements a secure wallet system, that allows authenticated transactions between users, through the Intel SGX. As such, this wallet system will be extended to provide traceability of its transactions through a blockchain. This blockchain will use Intel SGX to provide an efficient causal consistency mechanism for ordering transactions. After this, the following step will be to support the execution of smart-contracts, besides regular transactions.Nos últimos anos, as blockchains tornaram-se bastante populares e o motivo é o conjunto de propriedades que fornecem, como a privacidade dos utilizadores e um registo público de transações. Essa popularidade é verificável pelo número de criptomoedas existentes e pelo atual valor de mercado da moeda Bitcoin. Desde a sua introdução, o conceito de blockchain evoluiu bastante e surgiu o conceito de smart-contract, que permite realizar operações mais complexas sobre uma blockchain, além de simples transações. Contudo, existem problemas que impedem blockchains de serem adotadas como so luções convencionais ou como uma alternativa a soluções centralizadas, como o caso de sistemas bancários. O seu principal problema é ineficiência, resultante da necessidade de um algoritmo de consensus que forneça ordem total das transações. Os sistemas tradi cionais resolvem esse problema facilmente, sendo que têm uma única entidade central que ordena transações, o que não pode ser feito em sistemas descentralizados. Assim, a eficiência e a escalabilidade das blockchains sofrem com a utilização de algoritmos de consensus dispendiosos, o que significa que não conseguem competir atualmente com sistemas centralizados que fornecem uma maior quantidade de poder de processamento transacional. No entanto, com o aparecimento de novas arquiteturas de processadores, hardware seguro e tecnologias de computação confiável (por exemplo, Intel SGX e ARM TrustZone), tornou-se possível investigar novas formas de melhorar os problemas de ineficiência dos sistemas de blockchain e a construção de sistemas melhores e mais eficientes. Assim sendo, esta dissertação visa construir uma blockchain eficiente com recurso ao Intel SGX. O ponto de partida será um sistema de wallet, que permite transações autenticadas entre usuários através do Intel SGX, desnvolvido numa dissertação anterior. Como tal, esse sistema será estendido para fornecer rastreabilidade das transações através de uma blockchain. Esta blockchain utilizará o Intel SGX para fornecer um mecanismo de consistência causal eficiente para a ordenação das transações. Depois disto, o passo seguinte será suportar a execução de smart-contract, além de simples transações
    corecore