84 research outputs found

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Dynamic data consistency maintenance in peer-to-peer caching system

    Get PDF
    Master'sMASTER OF SCIENC

    PiCasso: enabling information-centric multi-tenancy at the edge of community mesh networks

    Get PDF
    © 2019 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Edge computing is radically shaping the way Internet services are run by enabling computations to be available close to the users - thus mitigating the latency and performance challenges faced in today’s Internet infrastructure. Emerging markets, rural and remote communities are further away from the cloud and edge computing has indeed become an essential panacea. Many solutions have been recently proposed to facilitate efficient service delivery in edge data centers. However, we argue that those solutions cannot fully support the operations in Community Mesh Networks (CMNs) since the network connection may be less reliable and exhibit variable performance. In this paper, we propose to leverage lightweight virtualisation, Information-Centric Networking (ICN), and service deployment algorithms to overcome these limitations. The proposal is implemented in the PiCasso system, which utilises in-network caching and name based routing of ICN, combined with our HANET (HArdware and NETwork Resources) service deployment heuristic, to optimise the forwarding path of service delivery in a network zone. We analyse the data collected from the Guifi.net Sants network zone, to develop a smart heuristic for the service deployment in that zone. Through a real deployment in Guifi.net, we show that HANET improves the response time up to 53% and 28.7% for stateless and stateful services respectively. PiCasso achieves 43% traffic reduction on service delivery in our real deployment, compared to the traditional host-centric communication. The overall effect of our ICN platform is that most content and service delivery requests can be satisfied very close to the client device, many times just one hop away, decoupling QoS from intra-network traffic and origin server load.Peer ReviewedPostprint (author's final draft

    Towards Highly Available and Self-Healing Grid Services

    Get PDF
    The volatility of nodes in large scale distributed systems endangers the availability of grid services and makes them difficult to use. In such a context, structured peer-to-peer overlays can be used to provide scalable and fault tolerant communication mechanisms. To ensure the availability of services, active replication can be used on top of the overlays. In this paper, we present Semias, a framework that is based on active replication on top of a structured overlay to provide high availability and self-healing for stateful grid services. The self-healing mechanisms of Semias ensure the high availability of the replicated services while minimizing the number of reconfigurations. We have used Semias to make Vigne grid middleware services highly available. Experiments run on Grid'5000 and PlanetLab show the performance and self-healing properties of the framework

    Linear Scalability of Distributed Applications

    Get PDF
    The explosion of social applications such as Facebook, LinkedIn and Twitter, of electronic commerce with companies like Amazon.com and Ebay.com, and of Internet search has created the need for new technologies and appropriate systems to manage effectively a considerable amount of data and users. These applications must run continuously every day of the year and must be capable of surviving sudden and abrupt load increases as well as all kinds of software, hardware, human and organizational failures. Increasing (or decreasing) the allocated resources of a distributed application in an elastic and scalable manner, while satisfying requirements on availability and performance in a cost-effective way, is essential for the commercial viability but it poses great challenges in today's infrastructures. Indeed, Cloud Computing can provide resources on demand: it now becomes easy to start dozens of servers in parallel (computational resources) or to store a huge amount of data (storage resources), even for a very limited period, paying only for the resources consumed. However, these complex infrastructures consisting of heterogeneous and low-cost resources are failure-prone. Also, although cloud resources are deemed to be virtually unlimited, only adequate resource management and demand multiplexing can meet customer requirements and avoid performance deteriorations. In this thesis, we deal with adaptive management of cloud resources under specific application requirements. First, in the intra-cloud environment, we address the problem of cloud storage resource management with availability guarantees and find the optimal resource allocation in a decentralized way by means of a virtual economy. Data replicas migrate, replicate or delete themselves according to their economic fitness. Our approach responds effectively to sudden load increases or failures and makes best use of the geographical distance between nodes to improve application-specific data availability. We then propose a decentralized approach for adaptive management of computational resources for applications requiring high availability and performance guarantees under load spikes, sudden failures or cloud resource updates. Our approach involves a virtual economy among service components (similar to the one among data replicas) and an innovative cascading scheme for setting up the performance goals of individual components so as to meet the overall application requirements. Our approach manages to meet application requirements with the minimum resources, by allocating new ones or releasing redundant ones. Finally, as cloud storage vendors offer online services at different rates, which can vary widely due to second-degree price discrimination, we present an inter-cloud storage resource allocation method to aggregate resources from different storage vendors and provide to the user a system which guarantees the best rate to host and serve its data, while satisfying the user requirements on availability, durability, latency, etc. Our system continuously optimizes the placement of data according to its type and usage pattern, and minimizes migration costs from one provider to another, thereby avoiding vendor lock-in

    The scalability of reliable computation in Erlang

    Get PDF
    With the advent of many-core architectures, scalability is a key property for programming languages. Actor-based frameworks like Erlang are fundamentally scalable, but in practice they have some scalability limitations. The RELEASE project aims to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on emergent commodity architectures with 10,000 cores. The RELEASE consortium works to scale Erlang at the virtual machine, language level, infrastructure levels, and to supply profiling and refactoring tools. This research contributes to the RELEASE project at the language level. Firstly, we study the provision of scalable persistent storage options for Erlang. We articulate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. We investigate the scalability limits of the Riak NoSQL DBMS using Basho Bench up to 100 nodes on the Kalkyl cluster and establish for the first time scientifically the scalability limit of Riak as 60 nodes, thereby confirming developer folklore. We design and implement DE-Bench, a scalable fault-tolerant peer-to-peer benchmarking tool that measures the throughput and latency of distributed Erlang commands on a cluster of Erlang nodes. We employ DE-Bench to investigate the scalability limits of distributed Erlang on up to 150 nodes and 1200 cores. Our results demonstrate that the frequency of global commands limits the scalability of distributed Erlang. We also show that distributed Erlang scales linearly up to 150 nodes and 1200 cores with relatively heavy data and computation loads when no global commands are used. As part of the RELEASE project, the Glasgow University team has developed Scalable Distributed Erlang (SD Erlang) to address the scalability limits of distributed Erlang. We evaluate SD Erlang by designing and implementing the first ever demonstrators for SD Erlang, i.e. DE-Bench, Orbit and Ant Colony Optimisation(ACO). We employ DE-Bench to evaluate the performance and scalability of group operations in SD-Erlang up to 100 nodes. Our results show that the alternatives SD-Erlang offers for global commands (i.e. group commands) scale linearly up to 100 nodes. We also develop and evaluate an SD-Erlang implementation of Orbit, a symbolic computing kernel and a generalization of a transitive closure computation. Our evaluation results show that SD Erlang Orbit outperforms the distributed Erlang Orbit on 160 nodes and 1280 cores. Moreover, we develop a reliable distributed version of ACO and show that the reliability of ACO limits its scalability in traditional distributed Erlang. We use SD-Erlang to improve the scalability of the reliable ACO by eliminating global commands and avoiding full mesh connectivity between nodes. We show that SD Erlang reduces the network traffic between nodes in an Erlang cluster effectively

    SoS: self-organizing substrates

    Get PDF
    Large-scale networked systems often, both by design or chance exhibit self-organizing properties. Understanding self-organization using tools from cybernetics, particularly modeling them as Markov processes is a first step towards a formal framework which can be used in (decentralized) systems research and design.Interesting aspects to look for include the time evolution of a system and to investigate if and when a system converges to some absorbing states or stabilizes into a dynamic (and stable) equilibrium and how it performs under such an equilibrium state. Such a formal framework brings in objectivity in systems research, helping discern facts from artefacts as well as providing tools for quantitative evaluation of such systems. This thesis introduces such formalism in analyzing and evaluating peer-to-peer (P2P) systems in order to better understand the dynamics of such systems which in turn helps in better designs. In particular this thesis develops and studies the fundamental building blocks for a P2P storage system. In the process the design and evaluation methodology we pursue illustrate the typical methodological approaches in studying and designing self-organizing systems, and how the analysis methodology influences the design of the algorithms themselves to meet system design goals (preferably with quantifiable guarantees). These goals include efficiency, availability and durability, load-balance, high fault-tolerance and self-maintenance even in adversarial conditions like arbitrarily skewed and dynamic load and high membership dynamics (churn), apart of-course the specific functionalities that the system is supposed to provide. The functionalities we study here are some of the fundamental building blocks for various P2P applications and systems including P2P storage systems, and hence we call them substrates or base infrastructure. These elemental functionalities include: (i) Reliable and efficient discovery of resources distributed over the network in a decentralized manner; (ii) Communication among participants in an address independent manner, i.e., even when peers change their physical addresses; (iii) Availability and persistence of stored objects in the network, irrespective of availability or departure of individual participants from the system at any time; and (iv) Freshness of the objects/resources' (up-to-date replicas). Internet-scale distributed index structures (often termed as structured overlays) are used for discovery and access of resources in a decentralized setting. We propose a rapid construction from scratch and maintenance of the P-Grid overlay network in a self-organized manner so as to provide efficient search of both individual keys as well as a whole range of keys, doing so providing good load-balancing characteristics for diverse kind of arbitrarily skewed loads - storage and replication, query forwarding and query answering loads. For fast overlay construction we employ recursive partitioning of the key-space so that the resulting partitions are balanced with respect to storage load and replication. The proper algorithmic parameters for such partitioning is derived from a transient analysis of the partitioning process which has Markov property. Preservation of ordering information in P-Grid such that queries other than exact queries, like range queries can be efficiently and rather trivially handled makes P-Grid suitable for data-oriented applications. Fast overlay construction is analogous to building an index on a new set of keys making P-Grid suitable as the underlying indexing mechanism for peer-to-peer information retrieval applications among other potential applications which may require frequent indexing of new attributes apart regular updates to an existing index. In order to deal with membership dynamics, in particular changing physical address of peers across sessions, the overlay itself is used as a (self-referential) directory service for maintaining the participating peers' physical addresses across sessions. Exploiting this self-referential directory, a family of overlay maintenance scheme has been designed with lower communication overhead than other overlay maintenance strategies. The notion of dynamic equilibrium study for overlays under continuous churn and repairs, modeled as a Markov process, was introduced in order to evaluate and compare the overlay maintenance schemes. While the self-referential directory was originally invented to realize overlay maintenance schemes with lower overheads than existing overlay maintenance schemes, the self-referential directory is generic in nature and can be used for various other purposes, e.g., as a decentralized public key infrastructure. Persistence of peer identity across sessions, in spite of changes in physical address, provides a logical independence of the overlay network from the underlying physical network. This has many other potential usages, for example, efficient maintenance mechanisms for P2P storage systems and P2P trust and reputation management. We specifically look into the dynamics of maintaining redundancy for storage systems and design a novel lazy maintenance strategy. This strategy is algorithmically a simple variant of existing maintenance strategies which adapts to the system dynamics. This randomized lazy maintenance strategy thus explores the cost-performance trade-offs of the storage maintenance operations in a self-organizing manner. We model the storage system (redundancy), under churn and maintenance, as a Markov process. We perform an equilibrium study to show that the system operates in a more stable dynamic equilibrium with our strategy than for the existing maintenance scheme for comparable overheads. Particularly, we show that our maintenance scheme provides substantial performance gains in terms of maintenance overhead and system's resilience in presence of churn and correlated failures. Finally, we propose a gossip mechanism which works with lower communication overhead than existing approaches for communication among a relatively large set of unreliable peers without assuming any specific structure for their mutual connectivity. We use such a communication primitive for propagating replica updates in P2P systems, facilitating management of mutable content in P2P systems. The peer population affected by a gossip can be modeled as a Markov process. Studying the transient spread of gossips help in choosing proper algorithm parameters to reduce communication overhead while guaranteeing coverage of online peers. Each of these substrates in themselves were developed to find practical solutions for real problems. Put together, these can be used in other applications, including a P2P storage system with support for efficient lookup and inserts, membership dynamics, content mutation and updates, persistence and availability. Many of the ideas have already been implemented in real systems and several others are in the way to be integrated into the implementations. There are two principal contributions of this dissertation. It provides design of the P2P systems which are useful for end-users as well as other application developers who can build upon these existing systems. Secondly, it adapts and introduces the methodology of analysis of a system's time-evolution (tools typically used in diverse domains including physics and cybernetics) to study the long run behavior of P2P systems, and uses this methodology to (re-)design appropriate algorithms and evaluate them. We observed that studying P2P systems from the perspective of complex systems reveals their inner dynamics and hence ways to exploit such dynamics for suitable or better algorithms. In other words, the analysis methodology in itself strongly influences and inspires the way we design such systems. We believe that such an approach of orchestrating self-organization in internet-scale systems, where the algorithms and the analysis methodology have strong mutual influence will significantly change the way future such systems are developed and evaluated. We envision that such an approach will particularly serve as an important tool for the nascent but fast moving P2P systems research and development community

    Distributed k-ary System: Algorithms for Distributed Hash Tables

    Get PDF
    This dissertation presents algorithms for data structures called distributed hash tables (DHT) or structured overlay networks, which are used to build scalable self-managing distributed systems. The provided algorithms guarantee lookup consistency in the presence of dynamism: they guarantee consistent lookup results in the presence of nodes joining and leaving. Similarly, the algorithms guarantee that routing never fails while nodes join and leave. Previous algorithms for lookup consistency either suffer from starvation, do not work in the presence of failures, or lack proof of correctness. Several group communication algorithms for structured overlay networks are presented. We provide an overlay broadcast algorithm, which unlike previous algorithms avoids redundant messages, reaching all nodes in O(log n) time, while using O(n) messages, where n is the number of nodes in the system. The broadcast algorithm is used to build overlay multicast. We introduce bulk operation, which enables a node to efficiently make multiple lookups or send a message to all nodes in a specified set of identifiers. The algorithm ensures that all specified nodes are reached in O(log n) time, sending maximum O(log n) messages per node, regardless of the input size of the bulk operation. Moreover, the algorithm avoids sending redundant messages. Previous approaches required multiple lookups, which consume more messages and can render the initiator a bottleneck. Our algorithms are used in DHT-based storage systems, where nodes can do thousands of lookups to fetch large files. We use the bulk operation algorithm to construct a pseudo-reliable broadcast algorithm. Bulk operations can also be used to implement efficient range queries. Finally, we describe a novel way to place replicas in a DHT, called symmetric replication, that enables parallel recursive lookups. Parallel lookups are known to reduce latencies. However, costly iterative lookups have previously been used to do parallel lookups. Moreover, joins or leaves only require exchanging O(1) messages, while other schemes require at least log(f) messages for a replication degree of f. The algorithms have been implemented in a middleware called the Distributed k-ary System (DKS), which is briefly described

    Data Sharing in P2P Systems

    Get PDF
    To appear in Springer's "Handbook of P2P Networking"In this chapter, we survey P2P data sharing systems. All along, we focus on the evolution from simple file-sharing systems, with limited functionalities, to Peer Data Management Systems (PDMS) that support advanced applications with more sophisticated data management techniques. Advanced P2P applications are dealing with semantically rich data (e.g. XML documents, relational tables), using a high-level SQL-like query language. We start our survey with an overview over the existing P2P network architectures, and the associated routing protocols. Then, we discuss data indexing techniques based on their distribution degree and the semantics they can capture from the underlying data. We also discuss schema management techniques which allow integrating heterogeneous data. We conclude by discussing the techniques proposed for processing complex queries (e.g. range and join queries). Complex query facilities are necessary for advanced applications which require a high level of search expressiveness. This last part shows the lack of querying techniques that allow for an approximate query answering

    On service optimization in community network micro-clouds

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i KTH Royal Institute of TechnologyInternet coverage in the world is still weak and local communities are required to come together and build their own network infrastructures. People collaborate for the common goal of accessing the Internet and cloud services by building Community networks (CNs). The use of Internet cloud services has grown over the last decade. Community network cloud infrastructures (i.e. micro-clouds) have been introduced to run services inside the network, without the need to consume them from the Internet. CN micro-clouds aims for not only an improved service performance, but also an entry point for an alternative to Internet cloud services in CNs. However, the adaptation of the services to be used in CN micro-clouds have their own challenges since the use of low-capacity devices and wireless connections without a central management is predominant in CNs. Further, large and irregular topology of the network, high software and hardware diversity and different service requirements in CNs, makes the CN micro-clouds a challenging environment to run local services, and to achieve service performance and quality similar to Internet cloud services. In this thesis, our main objective is the optimization of services (performance, quality) in CN micro-clouds, facilitating entrance to other services and motivating members to make use of CN micro-cloud services as an alternative to Internet services. We present an approach to handle services in CN micro-cloud environments in order to improve service performance and quality that can be approximated to Internet services, while also giving to the community motivation to use CN micro-cloud services. Furthermore, we break the problem into different levels (resource, service and middleware), propose a model that provides improvements for each level and contribute with information that helps to support the improvements (in terms of service performance and quality) in the other levels. At the resource level, we facilitate the use of community devices by utilizing virtualization techniques that isolate and manage CN micro-cloud services in order to have a multi-purpose environment that fosters services in the CN micro-cloud environment. At the service level, we build a monitoring tool tailored for CN micro-clouds that helps us to analyze service behavior and performance in CN micro-clouds. Subsequently, the information gathered enables adaptation of the services to the environment in order to improve their quality and performance under CN environments. At the middleware level, we build overlay networks as the main communication system according to the social information in order to improve paths and routes of the nodes, and improve transmission of data across the network by utilizing the relationships already established in the social network or community of practices that are related to the CNs. Therefore, service performance in CN micro-clouds can become more stable with respect to resource usage, performance and user perceived quality.Acceder a Internet sigue siendo un reto en muchas partes del mundo y las comunidades locales se ven en la necesidad de colaborar para construir sus propias infraestructuras de red. Los usuarios colaboran por el objetivo común de acceder a Internet y a los servicios en la nube construyendo redes comunitarias (RC). El uso de servicios de Internet en la nube ha crecido durante la última década. Las infraestructuras de nube en redes comunitarias (i.e., micronubes) han aparecido para albergar servicios dentro de las mismas redes, sin tener que acceder a Internet para usarlos. Las micronubes de las RC no solo tienen por objetivo ofrecer un mejor rendimiento, sino también ser la puerta de entrada en las RC hacia una alternativa a los servicios de Internet en la nube. Sin embargo, la adaptación de los servicios para ser usados en micronubes de RC conlleva sus retos ya que el uso de dispositivos de recursos limitados y de conexiones inalámbricas sin una gestión centralizada predominan en las RC. Más aún, la amplia e irregular topología de la red, la diversidad en el hardware y el software y los diferentes requisitos de los servicios en RC convierten en un desafío albergar servicios locales en micronubes de RC y obtener un rendimiento y una calidad del servicio comparables a los servicios de Internet en la nube. Esta tesis tiene por objetivo la optimización de servicios (rendimiento, calidad) en micronubes de RC, facilitando la entrada a otros servicios y motivando a sus miembros a usar los servicios en la micronube de RC como una alternativa a los servicios en Internet. Presentamos una aproximación para gestionar los servicios en entornos de micronube de RC para mejorar su rendimiento y calidad comparable a los servicios en Internet, a la vez que proporcionamos a la comunidad motivación para usar los servicios de micronube en RC. Además, dividimos el problema en distintos niveles (recursos, servicios y middleware), proponemos un modelo que proporciona mejoras para cada nivel y contribuye con información que apoya las mejoras (en términos de rendimiento y calidad de los servicios) en los otros niveles. En el nivel de los recursos, facilitamos el uso de dispositivos comunitarios al emplear técnicas de virtualización que aíslan y gestionan los servicios en micronubes de RC para obtener un entorno multipropósito que fomenta los servicios en el entorno de micronube de RC. En el nivel de servicio, construimos una herramienta de monitorización a la medida de las micronubes de RC que nos ayuda a analizar el comportamiento de los servicios y su rendimiento en micronubes de RC. Luego, la información recopilada permite adaptar los servicios al entorno para mejorar su calidad y rendimiento bajo las condiciones de una RC. En el nivel de middleware, construimos redes de overlay que actúan como el sistema de comunicación principal de acuerdo a información social para mejorar los caminos y las rutas de los nodos y mejoramos la transmisión de datos a lo largo de la red al utilizar las relaciones preestablecidas en la red social o la comunidad de prácticas que están relacionadas con las RC. De este modo, el rendimiento en las micronubes de RC puede devenir más estable respecto al uso de recursos, el rendimiento y la calidad percibidas por el usuario.Postprint (published version
    corecore