183 research outputs found
Benchmarking MongoDB multi-document transactions in a sharded cluster
Relational databases like Oracle, MySQL, and Microsoft SQL Server offer trans- action processing as an integral part of their design. These databases have been a primary choice among developers for business-critical workloads that need the highest form of consistency. On the other hand, the distributed nature of NoSQL databases makes them suitable for scenarios needing scalability, faster data access, and flexible schema design. Recent developments in the NoSQL database community show that NoSQL databases have started to incorporate transactions in their drivers to let users work on business-critical scenarios without compromising the power of distributed NoSQL features [1].
MongoDB is a leading document store that has supported single document atomicity since its first version. Sharding is the key technique to support the horizontal scalability in MongoDB. The latest version MongoDB 4.2 enables multi-document transactions to run on sharded clusters, seeking both scalability and ACID multi- documents. Transaction processing is a novel feature in MongoDB, and benchmarking the performance of MongoDB multi-document transactions in sharded clusters can encourage developers to use ACID transactions for business-critical workloads.
We have adapted pytpcc framework to conduct a series of benchmarking experi- ments aiming at finding the impact of tunable consistency, database size, and design choices on the multi-document transaction in MongoDB sharded clusters. We have used TPC’s OLTP workload under a variety of experimental settings to measure business throughput. To the best of our understanding, this is the first attempt towards benchmarking MongoDB multi-document transactions in a sharded cluster
A decentralized framework for cross administrative domain data sharing
Federation of messaging and storage platforms located in remote datacenters is an essential functionality to share data among geographically distributed platforms. When systems are administered by the same owner data replication reduces data access latency bringing data closer to applications and enables fault tolerance to face disaster recovery of an entire location. When storage platforms are administered by different owners data replication across different administrative domains is essential for enterprise application data integration. Contents and services managed by different software platforms need to be integrated to provide richer contents and services. Clients may need to share subsets of data in order to enable collaborative analysis and service integration. Platforms usually include proprietary federation functionalities and specific APIs to let external software and platforms access their internal data. These different techniques may not be applicable to all environments and networks due to security and technological restrictions. Moreover the federation of dispersed nodes under a decentralized administration scheme is still a research issue. This thesis is a contribution along this research direction as it introduces and describes a framework, called \u201cWideGroups\u201d, directed towards the creation and the management of an automatic federation and integration of widely dispersed platform nodes. It is based on groups to exchange messages among distributed applications located in different remote datacenters. Groups are created and managed using client side programmatic configuration without touching servers. WideGroups enables the extension of the software platform services to nodes belonging to different administrative domains in a wide area network environment. It lets different nodes form ad-hoc overlay networks on-the-fly depending on message destinations located in distinct administrative domains. It supports multiple dynamic overlay networks based on message groups, dynamic discovery of nodes and automatic setup of overlay networks among nodes with no server-side configuration. I designed and implemented platform connectors to integrate the framework as the federation module of Message Oriented Middleware and Key Value Store platforms, which are among the most widespread paradigms supporting data sharing in distributed systems
Secure storage systems for untrusted cloud environments
The cloud has become established for applications that need to be scalable and highly
available. However, moving data to data centers owned and operated by a third party,
i.e., the cloud provider, raises security concerns because a cloud provider could easily
access and manipulate the data or program flow, preventing the cloud from being
used for certain applications, like medical or financial.
Hardware vendors are addressing these concerns by developing Trusted Execution
Environments (TEEs) that make the CPU state and parts of memory inaccessible from
the host software. While TEEs protect the current execution state, they do not provide
security guarantees for data which does not fit nor reside in the protected memory
area, like network and persistent storage.
In this work, we aim to address TEEs’ limitations in three different ways, first we
provide the trust of TEEs to persistent storage, second we extend the trust to multiple
nodes in a network, and third we propose a compiler-based solution for accessing
heterogeneous memory regions. More specifically,
• SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER
implements a key-value interface. Its design is based on LSM data structures, but
extends them to provide confidentiality, integrity, and freshness for the stored
data. Thus, SPEICHER can prove to the client that the data has not been tampered
with by an attacker.
• AVOCADO is a distributed in-memory key-value store (KVS) that extends the
trust that TEEs provide across the network to multiple nodes, allowing KVSs to
scale beyond the boundaries of a single node. On each node, AVOCADO carefully
divides data between trusted memory and untrusted host memory, to maximize
the amount of data that can be stored on each node. AVOCADO leverages the
fact that we can model network attacks as crash-faults to trust other nodes with
a hardened ABD replication protocol.
• TOAST is based on the observation that modern high-performance systems
often use several different heterogeneous memory regions that are not easily
distinguishable by the programmer. The number of regions is increased by the
fact that TEEs divide memory into trusted and untrusted regions. TOAST is a
compiler-based approach to unify access to different heterogeneous memory
regions and provides programmability and portability. TOAST uses a
load/store interface to abstract most library interfaces for different memory
regions
Control Plane in Software Defined Networks and Stateful Data Planes
L'abstract è presente nell'allegato / the abstract is in the attachmen
Invalidation-based protocols for replicated datastores
Distributed in-memory datastores underpin cloud applications that run within a datacenter and demand high performance, strong consistency, and availability. A key feature of datastores is data replication. The data are replicated across servers because a single server often cannot handle the request load. Replication is also necessary to guarantee that a server or link failure does not render a portion of the dataset inaccessible. A replication protocol is responsible for ensuring strong consistency between the replicas of a datastore, even when faults occur, by determining the actions necessary to access and manipulate the data. Consequently, a replication protocol also drives the datastore's performance.
Existing strongly consistent replication protocols deliver fault tolerance but fall short in terms of performance. Meanwhile, the opposite occurs in the world of multiprocessors, where data are replicated across the private caches of different cores. The multiprocessor regime uses invalidations to afford strongly consistent replication with high performance but neglects fault tolerance.
Although handling failures in the datacenter is critical for data availability, we observe that the common operation is fault-free and far exceeds the operation during faults. In other words, the common operating environment inside a datacenter closely resembles that of a multiprocessor. Based on this insight, we draw inspiration from the multiprocessor for high-performance, strongly consistent replication in the datacenter. The primary contribution of this thesis is in adapting invalidating protocols to the nuances of replicated datastores, which include skewed data accesses, fault tolerance, and distributed transactions
Explorer l’hétérogénéité dans la réplication de données décentralisées faiblement cohérentes
Decentralized systems are scalable by design but also difficult to coordinate due to their weak coupling. Replicating data in these geo-distributed systems is therefore a challenge inherent to their structure. The two contributions of this thesis exploit the heterogeneity of user requirements and enable personalizable quality of services for data replication in decentralized systems. Our first contribution Gossip Primary-Secondary enables the consistency criterion Update consistency Primary-Secondary to offer differentiated guarantees in terms of consistency and message delivery latency for large-scale data replication. Our second contribution Dietcoin enriches Bitcoin with diet nodes that can (i) verify the correctness of entire subchains of blocks while avoiding the exorbitant cost of bootstrap verification and (ii) personalize their own security and resource consumption guarantees.Les systèmes décentralisés sont par nature extensibles mais sont également difficiles à coordonner en raison de leur faible couplage. La réplication de données dans ces systèmes géo-répartis est donc un défi inhérent à leur structure. Les deux contributions de cette thèse exploitent l'hétérogénéité des besoins des utilisateurs et permettent une qualité de service personnalisable pour la réplication de données dans les systèmes décentralisés. Notre première contribution Gossip Primary-Secondary étend le critère de cohérence Update consistency Primary-Secondary afin d'offrir des garanties différenciées de cohérence et de latence de messages pour la réplication de données à grande échelle. Notre seconde contribution Dietcoin enrichit Bitcoin avec des nœuds diet qui peuvent (i) vérifier la validité de sous-chaînes de blocs en évitant le coût exorbitant de la vérification initiale et (ii) choisir leur propres garanties de sécurité et de consommation de ressources
Smart-contract Blockchain with Secure Hardware
In recent years, blockchains have grown in popularity and the main reason for this
growth is the set of properties that they provide, such as user privacy and a public record
of transactions. This popularity is verifiable by the number of cryptocurrencies currently
available and by the current market value of Bitcoin currency. Since its introduction,
blockchain has evolved and another concept closely linked with it is smart-contract, which
allows for more complex operations over the blockchain than simple transactions.
Nevertheless, blockchain technologies have significant problems that prevent it to be
adopted as a mainstream solution, or at least as an alternative to centralized solutions
such as banking systems. The main one is its inefficiency, which is due to the need of a
consensus algorithm that provides total order of transactions. Traditional systems easily
solve this by having a single central entity that orders transactions, which can’t be done
in decentralized systems. Thus, blockchain’s efficiency and scalability suffer from the
need of time-costly consensus algorithms, which means that they can’t currently compete
with centralized systems that provide a much greater amount of transactional processing
power.
However, with the emergence of novel processor architectures, secure hardware and
trusted computing technologies (e.g. Intel SGX and ARM TrustZone), it became possible
to investigate new ways of improving the inefficiency issues of blockchain systems, by
designing better and improved blockchains.
With all this in mind, this dissertation aims to build an efficient blockchain system
that leverages trusted technologies, namely the Intel SGX. Also, a previous thesis will
serve as a starting point, since it already implements a secure wallet system, that allows
authenticated transactions between users, through the Intel SGX. As such, this wallet
system will be extended to provide traceability of its transactions through a blockchain.
This blockchain will use Intel SGX to provide an efficient causal consistency mechanism
for ordering transactions. After this, the following step will be to support the execution
of smart-contracts, besides regular transactions.Nos últimos anos, as blockchains tornaram-se bastante populares e o motivo é o conjunto
de propriedades que fornecem, como a privacidade dos utilizadores e um registo público
de transações. Essa popularidade é verificável pelo número de criptomoedas existentes
e pelo atual valor de mercado da moeda Bitcoin. Desde a sua introdução, o conceito de
blockchain evoluiu bastante e surgiu o conceito de smart-contract, que permite realizar
operações mais complexas sobre uma blockchain, além de simples transações.
Contudo, existem problemas que impedem blockchains de serem adotadas como so luções convencionais ou como uma alternativa a soluções centralizadas, como o caso de
sistemas bancários. O seu principal problema é ineficiência, resultante da necessidade
de um algoritmo de consensus que forneça ordem total das transações. Os sistemas tradi cionais resolvem esse problema facilmente, sendo que têm uma única entidade central
que ordena transações, o que não pode ser feito em sistemas descentralizados. Assim,
a eficiência e a escalabilidade das blockchains sofrem com a utilização de algoritmos de
consensus dispendiosos, o que significa que não conseguem competir atualmente com
sistemas centralizados que fornecem uma maior quantidade de poder de processamento
transacional.
No entanto, com o aparecimento de novas arquiteturas de processadores, hardware
seguro e tecnologias de computação confiável (por exemplo, Intel SGX e ARM TrustZone),
tornou-se possível investigar novas formas de melhorar os problemas de ineficiência dos
sistemas de blockchain e a construção de sistemas melhores e mais eficientes.
Assim sendo, esta dissertação visa construir uma blockchain eficiente com recurso
ao Intel SGX. O ponto de partida será um sistema de wallet, que permite transações
autenticadas entre usuários através do Intel SGX, desnvolvido numa dissertação anterior.
Como tal, esse sistema será estendido para fornecer rastreabilidade das transações através
de uma blockchain. Esta blockchain utilizará o Intel SGX para fornecer um mecanismo
de consistência causal eficiente para a ordenação das transações. Depois disto, o passo
seguinte será suportar a execução de smart-contract, além de simples transações
- …