442 research outputs found

    Improving the scalability of geo-replication with reservations

    Get PDF
    International audienceGeo-replicated systems improve performance and fault tolerance by replicating data on sites in different physical locations. These systems often eschew guaranteeing strong consistency because of performance loss and scalability and instead choose eventually consistency. Although eventual consistency improves performance especially in large scale but it might violate system invariants. In this work, we exploit reservation techniques to strengthen eventual consistency, by adding safety guarantees. We define a consistency model called RPB that takes the advantages of eventual consistency while providing stronger guarantees, including causality and safety properties

    Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

    Get PDF
    Geo-replicated databases often operate under the principle of eventual consistency to offer high-availability with low latency on a simple key/value store abstraction. Recently, some have adopted commutative data types to provide seamless reconciliation for special purpose data types, such as counters. Despite this, the inability to enforce numeric invariants across all replicas still remains a key shortcoming of relying on the limited guarantees of eventual consistency storage. We present a new replicated data type, called bounded counter, which adds support for numeric invariants to eventually consistent geo-replicated databases. We describe how this can be implemented on top of existing cloud stores without modifying them, using Riak as an example. Our approach adapts ideas from escrow transactions to devise a solution that is decentralized, fault-tolerant and fast. Our evaluation shows much lower latency and better scalability than the traditional approach of using strong consistency to enforce numeric invariants, thus alleviating the tension between consistency and availability

    Invariant preservation in geo-replicated data stores

    Get PDF
    The Internet has enabled people from all around the globe to communicate with each other in a matter of milliseconds. This possibility has a great impact in the way we work, behave and communicate, while the full extent of possibilities are yet to be known. As we become more dependent of Internet services, the more important is to ensure that these systems operate correctly, with low latency and high availability for millions of clients scattered all around the globe. To be able to provide service to a large number of clients, and low access latency for clients in different geographical locations, Internet services typically rely on georeplicated storage systems. Replication comes with costs that may affect service quality. To propagate updates between replicas, systems either choose to lose consistency in favor of better availability and latency (weak consistency), or maintain consistency, but the system might become unavailable during partitioning (strong consistency). In practice, many production systems rely on weak consistency storage systems to enhance user experience, overlooking that applications can become incorrect due to the weaker consistency assumptions. In this thesis, we study how to exploit application’s semantics to build correct applications without affecting the availability and latency of operations. We propose a new consistency model that breaks apart from traditional knowledge that applications consistency is dependent on coordinating the execution of operations across replicas. We show that it is possible to execute most operations with low latency and in an highly available way, while preserving application’s correctness. Our approach consists in specifying the fundamental properties that define the correctness of applications, i.e. the application invariants, and identify and prevent concurrent executions that potentially can make the state of the database inconsistent, i.e. that may violate some invariant. We explore different, complementary, approaches to implement this model. The Indigo approach consists in preventing conflicting operations from executing concurrently, by restricting the operations that each replica can execute at each moment to maintain application’s correctness. The IPA approach does not preclude the execution of any operation, ensuring high availability. To maintain application correctness, operations are modified to prevent invariant violations during replica reconciliation, or, if modifying operations provides an unsatisfactory semantics, it is possible to correct any invariant violations before a client can read an inconsistent state, by executing compensations. Evaluation shows that our approaches can ensure both low latency and high availability for most operations in common Internet application workloads, with small execution overhead in comparison to unmodified weak consistency systems, while enforcing application invariants, as in strong consistency systems

    High performance data processing

    Get PDF
    Dissertação de mestrado em Informatics EngeneeringÀ medida que as aplicações atingem uma maior quantidade de utilizadores, precisam de processar uma crescente quantidade de pedidos. Para além disso, precisam de muitas vezes satisfazer pedidos de utilizadores de diferentes partes do globo, onde as latências de rede têm um impacto significativo no desempenho em instalações monolíticas. Portanto, distribuição é uma solução muito procurada para melhorar a performance das camadas aplicacional e de dados. Contudo, distribuir dados não é uma tarefa simples se pretendemos assegurar uma forte consistência. Isto leva a que muitos sistemas de base de dados dependam de protocolos de sincronização pesados, como two-phase commit, consenso distribuído, bloqueamento distribuído, entre outros, enquanto que outros sistemas dependem em consistência fraca, não viável para alguns casos de uso. Esta tese apresenta o design, implementação e avaliação de duas soluções que têm como objetivo reduzir o impacto de assegurar garantias de forte consistência em sistemas de base de dados, especialmente aqueles distribuídos pelo globo. A primeira é o Primary Semi-Primary, uma arquitetura de base de dados distribuída com total replicação que permite que as réplicas evoluam independentemente, para evitar que os clientes precisem de esperar que escritas precedentes que não geram conflitos sejam propagadas. Apesar das réplicas poderem processar tanto leituras como escritas, melhorando a escalabilidade, o sistema continua a oferecer garantias de consistência forte, através do envio da certificação de transações para um nó central. O seu design é independente de modelos de dados, mas a sua implementação pode tirar partido do controlo de concorrência nativo oferecido por algumas base de dados, como é mostrado na implementação usando PostgreSQL e o seu Snapshot Isolation. Os resultados apresentam várias vantagens tanto em ambientes locais como globais. A segunda solução são os Multi-Record Values, uma técnica que particiona dinâmicamente valores numéricos em múltiplos registros, permitindo que escritas concorrentes possam executar com uma baixa probabilidade de colisão, reduzindo a taxa de abortos e/ou contenção na adquirição de locks. Garantias de limites inferiores, exigido por objetos como saldos bancários ou inventários, são assegurados por esta estratégia, ao contrário de muitas outras alternativas. O seu design é também indiferente do modelo de dados, sendo que as suas vantagens podem ser encontradas em sistemas SQL e NoSQL, bem como distribuídos ou centralizados, tal como apresentado na secção de avaliação.As applications reach an wider audience that ever before, they must process larger and larger amounts of requests. In addition, they often must be able to serve users all over the globe, where network latencies have a significant negative impact on monolithic deployments. Therefore, distribution is a well sought-after solution to improve performance of both applicational and database layers. However, distributing data is not an easy task if we want to ensure strong consistency guarantees. This leads many databases systems to rely on expensive synchronization controls protocols such as two-phase commit, distributed consensus, distributed locking, among others, while other systems rely on weak consistency, unfeasible for some use cases. This thesis presents the design, implementation and evaluation of two solutions aimed at reducing the impact of ensuring strong consistency guarantees on database systems, especially geo-distributed ones. The first is the Primary Semi-Primary, a full replication distributed database architecture that allows different replicas to evolve independently, to avoid that clients wait for preceding non-conflicting updates. Al though replicas can process both reads and writes, improving scalability, the system still ensures strong consistency guarantees, by relaying transactions’ certifications to a central node. Its design is independent of the underlying data model, but its implementation can take advantage of the native concurrency control offered by some systems, as is exemplified by an implementation using PostgreSQL and its Snapshot Isolation. The results present several advantages in both throughput and response time, when comparing to other alternative architectures, in both local and geo-distributed environments. The second solution is the Multi-Record Values, a technique that dynami cally partitions numeric values into multiple records, allowing concurrent writes to execute with low conflict probability, reducing abort rate and/or locking contention. Lower limit guarantees, required by objects such as balances or stocks, are ensure by this strategy, unlike many other similar alternatives. Its design is also data model agnostic, given its advantages can be found in both SQL and NoSQL systems, as well as both centralized and distributed database, as presented in the evaluation section

    A survey and classification of software-defined storage systems

    Get PDF
    The exponential growth of digital information is imposing increasing scale and efficiency demands on modern storage infrastructures. As infrastructure complexity increases, so does the difficulty in ensuring quality of service, maintainability, and resource fairness, raising unprecedented performance, scalability, and programmability challenges. Software-Defined Storage (SDS) addresses these challenges by cleanly disentangling control and data flows, easing management, and improving control functionality of conventional storage systems. Despite its momentum in the research community, many aspects of the paradigm are still unclear, undefined, and unexplored, leading to misunderstandings that hamper the research and development of novel SDS technologies. In this article, we present an in-depth study of SDS systems, providing a thorough description and categorization of each plane of functionality. Further, we propose a taxonomy and classification of existing SDS solutions according to different criteria. Finally, we provide key insights about the paradigm and discuss potential future research directions for the field.This work was financed by the Portuguese funding agency FCT-Fundacao para a Ciencia e a Tecnologia through national funds, the PhD grant SFRH/BD/146059/2019, the project ThreatAdapt (FCT-FNR/0002/2018), the LASIGE Research Unit (UIDB/00408/2020), and cofunded by the FEDER, where applicable

    An Efficient Holistic Data Distribution and Storage Solution for Online Social Networks

    Get PDF
    In the past few years, Online Social Networks (OSNs) have dramatically spread over the world. Facebook [4], one of the largest worldwide OSNs, has 1.35 billion users, 82.2% of whom are outside the US [36]. The browsing and posting interactions (text content) between OSN users lead to user data reads (visits) and writes (updates) in OSN datacenters, and Facebook now serves a billion reads and tens of millions of writes per second [37]. Besides that, Facebook has become one of the top Internet traffic sources [36] by sharing tremendous number of large multimedia files including photos and videos. The servers in datacenters have limited resources (e.g. bandwidth) to supply latency efficient service for multimedia file sharing among the rapid growing users worldwide. Most online applications operate under soft real-time constraints (e.g., ≤ 300 ms latency) for good user experience, and its service latency is negatively proportional to its income. Thus, the service latency is a very important requirement for Quality of Service (QoS) to the OSN as a web service, since it is relevant to the OSN’s revenue and user experience. Also, to increase OSN revenue, OSN service providers need to constrain capital investment, operation costs, and the resource (bandwidth) usage costs. Therefore, it is critical for the OSN to supply a guaranteed QoS for both text and multimedia contents to users while minimizing its costs. To achieve this goal, in this dissertation, we address three problems. i) Data distribution among datacenters: how to allocate data (text contents) among data servers with low service latency and minimized inter-datacenter network load; ii) Efficient multimedia file sharing: how to facilitate the servers in datacenters to efficiently share multimedia files among users; iii) Cost minimized data allocation among cloud storages: how to save the infrastructure (datacenters) capital investment and operation costs by leveraging commercial cloud storage services. Data distribution among datacenters. To serve the text content, the new OSN model, which deploys datacenters globally, helps reduce service latency to worldwide distributed users and release the load of the existing datacenters. However, it causes higher inter-datacenter communica-tion load. In the OSN, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. The distributed data storage, which only stores a user’s data to his/her geographically closest datacenters, simply mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter com-munication and hence long service latencies. Therefore, the OSNs need a data allocation algorithm among datacenters with minimized network load and low service latency. Efficient multimedia file sharing. To serve multimedia file sharing with rapid growing user population, the file distribution method should be scalable and cost efficient, e.g. minimiza-tion of bandwidth usage of the centralized servers. The P2P networks have been widely used for file sharing among a large amount of users [58, 131], and meet both scalable and cost efficient re-quirements. However, without fully utilizing the altruism and trust among friends in the OSNs, current P2P assisted file sharing systems depend on strangers or anonymous users to distribute files that degrades their performance due to user selfish and malicious behaviors. Therefore, the OSNs need a cost efficient and trustworthy P2P-assisted file sharing system to serve multimedia content distribution. Cost minimized data allocation among cloud storages. The new trend of OSNs needs to build worldwide datacenters, which introduce a large amount of capital investment and maintenance costs. In order to save the capital expenditures to build and maintain the hardware infrastructures, the OSNs can leverage the storage services from multiple Cloud Service Providers (CSPs) with existing worldwide distributed datacenters [30, 125, 126]. These datacenters provide different Get/Put latencies and unit prices for resource utilization and reservation. Thus, when se-lecting different CSPs’ datacenters, an OSN as a cloud customer of a globally distributed application faces two challenges: i) how to allocate data to worldwide datacenters to satisfy application SLA (service level agreement) requirements including both data retrieval latency and availability, and ii) how to allocate data and reserve resources in datacenters belonging to different CSPs to minimize the payment cost. Therefore, the OSNs need a data allocation system distributing data among CSPs’ datacenters with cost minimization and SLA guarantee. In all, the OSN needs an efficient holistic data distribution and storage solution to minimize its network load and cost to supply a guaranteed QoS for both text and multimedia contents. In this dissertation, we propose methods to solve each of the aforementioned challenges in OSNs. Firstly, we verify the benefits of the new trend of OSNs and present OSN typical properties that lay the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters (SD3) to allocate user data among geographical distributed datacenters. In SD3,a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a user’s different types of data (e.g., status update, friend post) for replication, making sure that a replica always reduces inter-datacenter communication. Secondly, we analyze a BitTorrent file sharing trace, which proves the necessity of proximity-and interest-aware clustering. Based on the trace study and OSN properties, to address the second problem, we propose a SoCial Network integrated P2P file sharing system for enhanced Efficiency and Trustworthiness (SOCNET) to fully and cooperatively leverage the common-interest, geographically-close and trust properties of OSN friends. SOCNET uses a hierarchical distributed hash table (DHT) to cluster common-interest nodes, and then further clusters geographically close nodes into a subcluster, and connects the nodes in a subcluster with social links. Thus, when queries travel along trustable social links, they also gain higher probability of being successfully resolved by proximity-close nodes, simultaneously enhancing efficiency and trustworthiness. Thirdly, to handle the third problem, we model the cost minimization problem under the SLA constraints using integer programming. According to the system model, we propose an Eco-nomical and SLA-guaranteed cloud Storage Service (ES3), which finds a data allocation and resource reservation schedule with cost minimization and SLA guarantee. ES3 incorporates (1) a data al-location and reservation algorithm, which allocates each data item to a datacenter and determines the reservation amount on datacenters by leveraging all the pricing policies; (2) a genetic algorithm based data allocation adjustment approach, which makes data Get/Put rates stable in each data-center to maximize the reservation benefit; and (3) a dynamic request redirection algorithm, which dynamically redirects a data request from an over-utilized datacenter to an under-utilized datacenter with sufficient reserved resource when the request rate varies greatly to further reduce the payment. Finally, we conducted trace driven experiments on a distributed testbed, PlanetLab, and real commercial cloud storage (Amazon S3, Windows Azure Storage and Google Cloud Storage) to demonstrate the efficiency and effectiveness of our proposed systems in comparison with other systems. The results show that our systems outperform others in the network savings and data distribution efficiency

    Antidote SQL: SQL for Weakly Consistent Databases

    Get PDF
    Distributed storage systems, such as NoSQL databases, employ weakly consistent semantics for providing good scalability and availability forweb services and applications. NoSQL offers scalability for the applications but it lacks functionality that could be used by programmers for reasoning about the correctness of their applications. On the other hand, SQL databases provide strongly consistent semantics that hurt availability. The goal of this work is to provide efficient support for SQL features on top of AntidoteDB NoSQL database, a weakly consistent data store. To that end, we extend AQL, a SQL interface for the AntidoteDB database, with new designed solutions for supporting common SQL features. We focus on improving support for the SQL invariant known as referential integrity, by providing solutions with relaxed and strict consistency models, this latter at the cost of requiring coordination among replicas; on allowing to apply partitioning to tables; and on developing an indexing system for managing primary and secondary indexes on the database, and for improving query processing inspired on SQL syntax. We evaluate our solution using the Basho Bench benchmarking tool and compare the performance of both systems, AQL and AntidoteDB, regarding the cost introduced by the referential integrity mechanism, the benefits of using secondary indexes on the query processing, and the performance of partitioning the database. Our results show that our referential integrity solution is optimal in performance when delete operations are not issued to the database, although noticeably slower with deletes on cascade, and implementing secondary indexes on the NoSQL data store shows improvements on query performance. We also show that partitioning does not harm the performance of the system

    Dimensionerings- en werkverdelingsalgoritmen voor lambda grids

    Get PDF
    Grids bestaan uit een verzameling reken- en opslagelementen die geografisch verspreid kunnen zijn, maar waarvan men de gezamenlijke capaciteit wenst te benutten. Daartoe dienen deze elementen verbonden te worden met een netwerk. Vermits veel wetenschappelijke applicaties gebruik maken van een Grid, en deze applicaties doorgaans grote hoeveelheden data verwerken, is het noodzakelijk om een netwerk te voorzien dat dergelijke grote datastromen op betrouwbare wijze kan transporteren. Optische transportnetwerken lenen zich hier uitstekend toe. Grids die gebruik maken van dergelijk netwerk noemt men lambda Grids. Deze thesis beschrijft een kader waarin het ontwerp en dimensionering van optische netwerken voor lambda Grids kunnen beschreven worden. Ook wordt besproken hoe werklast kan verdeeld worden op een Grid eens die gedimensioneerd is. Een groot deel van de resultaten werd bekomen door simulatie, waarbij gebruik gemaakt wordt van een eigen Grid simulatiepakket dat precies focust op netwerk- en Gridelementen. Het ontwerp van deze simulator, en de daarbijhorende implementatiekeuzes worden dan ook uitvoerig toegelicht in dit werk
    corecore