15 research outputs found

    Consensus Algorithms of Distributed Ledger Technology -- A Comprehensive Analysis

    Full text link
    The most essential component of every Distributed Ledger Technology (DLT) is the Consensus Algorithm (CA), which enables users to reach a consensus in a decentralized and distributed manner. Numerous CA exist, but their viability for particular applications varies, making their trade-offs a crucial factor to consider when implementing DLT in a specific field. This article provided a comprehensive analysis of the various consensus algorithms used in distributed ledger technologies (DLT) and blockchain networks. We cover an extensive array of thirty consensus algorithms. Eleven attributes including hardware requirements, pre-trust level, tolerance level, and more, were used to generate a series of comparison tables evaluating these consensus algorithms. In addition, we discuss DLT classifications, the categories of certain consensus algorithms, and provide examples of authentication-focused and data-storage-focused DLTs. In addition, we analyze the pros and cons of particular consensus algorithms, such as Nominated Proof of Stake (NPoS), Bonded Proof of Stake (BPoS), and Avalanche. In conclusion, we discuss the applicability of these consensus algorithms to various Cyber Physical System (CPS) use cases, including supply chain management, intelligent transportation systems, and smart healthcare.Comment: 50 pages, 20 figure

    A framework for multidimensional indexes on distributed and highly-available data stores

    Get PDF
    Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones científicas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances científicos. En esta tesis, describimos 3 aplicaciones científicas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientífico y simulaciones físicas, donde hemos experimentado en primera mano las limitaciones de las tecnologías existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serían deseables y no existen, y nos hemos centrado en dos características que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos científicos. Por un lado, las simulaciones científicas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples características correlacionadas. Por ejemplo, una partícula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analíticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analíticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo así la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una característica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analítico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analítico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los científicos ofreciendo una solución escalable e integr

    A framework for multidimensional indexes on distributed and highly-available data stores

    Get PDF
    Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones científicas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances científicos. En esta tesis, describimos 3 aplicaciones científicas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientífico y simulaciones físicas, donde hemos experimentado en primera mano las limitaciones de las tecnologías existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serían deseables y no existen, y nos hemos centrado en dos características que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos científicos. Por un lado, las simulaciones científicas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples características correlacionadas. Por ejemplo, una partícula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analíticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analíticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo así la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una característica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analítico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analítico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los científicos ofreciendo una solución escalable e integraPostprint (published version

    A Survey on Consensus Mechanisms and Mining Strategy Management in Blockchain Networks

    Full text link
    © 2013 IEEE. The past decade has witnessed the rapid evolution in blockchain technologies, which has attracted tremendous interests from both the research communities and industries. The blockchain network was originated from the Internet financial sector as a decentralized, immutable ledger system for transactional data ordering. Nowadays, it is envisioned as a powerful backbone/framework for decentralized data processing and data-driven self-organization in flat, open-access networks. In particular, the plausible characteristics of decentralization, immutability, and self-organization are primarily owing to the unique decentralized consensus mechanisms introduced by blockchain networks. This survey is motivated by the lack of a comprehensive literature review on the development of decentralized consensus mechanisms in blockchain networks. In this paper, we provide a systematic vision of the organization of blockchain networks. By emphasizing the unique characteristics of decentralized consensus in blockchain networks, our in-depth review of the state-of-the-art consensus protocols is focused on both the perspective of distributed consensus system design and the perspective of incentive mechanism design. From a game-theoretic point of view, we also provide a thorough review of the strategy adopted for self-organization by the individual nodes in the blockchain backbone networks. Consequently, we provide a comprehensive survey of the emerging applications of blockchain networks in a broad area of telecommunication. We highlight our special interest in how the consensus mechanisms impact these applications. Finally, we discuss several open issues in the protocol design for blockchain consensus and the related potential research directions

    Reliable and Real-Time Distributed Abstractions

    Get PDF
    The celebrated distributed computing approach for building systems and services using multiple machines continues to expand to new domains. Computation devices nowadays have additional sensing and communication capabilities, while becoming, at the same time, cheaper, faster and more pervasive. Consequently, areas like industrial control, smart grids and sensor networks are increasingly using such devices to control and coordinate system operations. However, compared to classic distributed systems, such real-world physical systems have different needs, e.g., real-time and energy efficiency requirements. Moreover, constraints that govern communication are also different. Networks become susceptible to inevitable random losses, especially when utilizing wireless and power line communication. This thesis investigates how to build various fundamental distributed computing abstractions (services) given the limitations, the performance and the application requirements and constraints of real-world control, smart grid and sensor systems. In quest of completeness, we discuss four distributed abstractions starting from the level of network links all the way up to the application level. At the link level, we show how to build an energy-efficient reliable communication service. This is especially important for devices with battery-powered wireless adapters where recharging might be unfeasible. We establish transmission policies that can be used by processes to decide when to transmit over the network in order to avoid losses and minimize re-transmissions. These policies allow messages to be reliably transmitted with minimum transmission energy. One level higher than links is failure detection, a software abstraction that relies on communication for identifying process crashes. We prove impossibility results concerning implementing classic eventual failure detectors in networks with probabilistic losses. We define a new implementable type of failure detectors, which preserves modularity. This means that existing deterministic algorithms using eventual failure detectors can still be used to solve certain distributed problems in lossy networks: we simply replace the existing failure detector with the one we define. Using failure detectors, processes might get information about failures at different times. However, to ensure dependability, environments such as distributed control systems (DCSs), require a membership service where processes agree about failures in real time. We prove that the necessary properties of this membership cannot be implemented deterministically, given probabilistic losses. We propose an algorithm that satisfies these properties, with high probability. We show analytically, as well as experimentally (within an industrial DCS), that our technique significantly enhances the DCS dependability, compared to classic membership services, at low additional cost. Finally, we investigate a real-time shared memory abstraction, which vastly simplifies programming control applications. We study the feasibility of implementing such an abstraction within DCSs, showing the impossibility of this task using traditional algorithms that are built on top of existing software blocks like failure detectors. We propose an approach that circumvents this impossibility by attaching information to the failure detection messages, analyze the performance of our technique and showcase ways of adapting it to various application needs and workloads

    Analyzing and Enhancing Routing Protocols for Friend-to-Friend Overlays

    Get PDF
    The threat of surveillance by governmental and industrial parties is more eminent than ever. As communication moves into the digital domain, the advances in automatic assessment and interpretation of enormous amounts of data enable tracking of millions of people, recording and monitoring their private life with an unprecedented accurateness. The knowledge of such an all-encompassing loss of privacy affects the behavior of individuals, inducing various degrees of (self-)censorship and anxiety. Furthermore, the monopoly of a few large-scale organizations on digital communication enables global censorship and manipulation of public opinion. Thus, the current situation undermines the freedom of speech to a detrimental degree and threatens the foundations of modern society. Anonymous and censorship-resistant communication systems are hence of utmost importance to circumvent constant surveillance. However, existing systems are highly vulnerable to infiltration and sabotage. In particular, Sybil attacks, i.e., powerful parties inserting a large number of fake identities into the system, enable malicious parties to observe and possibly manipulate a large fraction of the communication within the system. Friend-to-friend (F2F) overlays, which restrict direct communication to parties sharing a real-world trust relationship, are a promising countermeasure to Sybil attacks, since the requirement of establishing real-world trust increases the cost of infiltration drastically. Yet, existing F2F overlays suffer from a low performance, are vulnerable to denial-of-service attacks, or fail to provide anonymity. Our first contribution in this thesis is concerned with an in-depth analysis of the concepts underlying the design of state-of-the-art F2F overlays. In the course of this analysis, we first extend the existing evaluation methods considerably, hence providing tools for both our and future research in the area of F2F overlays and distributed systems in general. Based on the novel methodology, we prove that existing approaches are inherently unable to offer acceptable delays without either requiring exhaustive maintenance costs or enabling denial-of-service attacks and de-anonymization. Consequentially, our second contribution lies in the design and evaluation of a novel concept for F2F overlays based on insights of the prior in-depth analysis. Our previous analysis has revealed that greedy embeddings allow highly efficient communication in arbitrary connectivity-restricted overlays by addressing participants through coordinates and adapting these coordinates to the overlay structure. However, greedy embeddings in their original form reveal the identity of the communicating parties and fail to provide the necessary resilience in the presence of dynamic and possibly malicious users. Therefore, we present a privacy-preserving communication protocol for greedy embeddings based on anonymous return addresses rather than identifying node coordinates. Furthermore, we enhance the communication’s robustness and attack-resistance by using multiple parallel embeddings and alternative algorithms for message delivery. We show that our approach achieves a low communication complexity. By replacing the coordinates with anonymous addresses, we furthermore provably achieve anonymity in the form of plausible deniability against an internal local adversary. Complementary, our simulation study on real-world data indicates that our approach is highly efficient and effectively mitigates the impact of failures as well as powerful denial-of-service attacks. Our fundamental results open new possibilities for anonymous and censorship-resistant applications.Die Bedrohung der Überwachung durch staatliche oder kommerzielle Stellen ist ein drängendes Problem der modernen Gesellschaft. Heutzutage findet Kommunikation vermehrt über digitale Kanäle statt. Die so verfügbaren Daten über das Kommunikationsverhalten eines Großteils der Bevölkerung in Kombination mit den Möglichkeiten im Bereich der automatisierten Verarbeitung solcher Daten erlauben das großflächige Tracking von Millionen an Personen, deren Privatleben mit noch nie da gewesener Genauigkeit aufgezeichnet und beobachtet werden kann. Das Wissen über diese allumfassende Überwachung verändert das individuelle Verhalten und führt so zu (Selbst-)zensur sowie Ängsten. Des weiteren ermöglicht die Monopolstellung einiger weniger Internetkonzernen globale Zensur und Manipulation der öffentlichen Meinung. Deshalb stellt die momentane Situation eine drastische Einschränkung der Meinungsfreiheit dar und bedroht die Grundfesten der modernen Gesellschaft. Systeme zur anonymen und zensurresistenten Kommunikation sind daher von ungemeiner Wichtigkeit. Jedoch sind die momentanen System anfällig gegen Sabotage. Insbesondere ermöglichen es Sybil-Angriffe, bei denen ein Angreifer eine große Anzahl an gefälschten Teilnehmern in ein System einschleust und so einen großen Teil der Kommunikation kontrolliert, Kommunikation innerhalb eines solchen Systems zu beobachten und zu manipulieren. F2F Overlays dagegen erlauben nur direkte Kommunikation zwischen Teilnehmern, die eine Vertrauensbeziehung in der realen Welt teilen. Dadurch erschweren F2F Overlays das Eindringen von Angreifern in das System entscheidend und verringern so den Einfluss von Sybil-Angriffen. Allerdings leiden die existierenden F2F Overlays an geringer Leistungsfähigkeit, Anfälligkeit gegen Denial-of-Service Angriffe oder fehlender Anonymität. Der erste Beitrag dieser Arbeit liegt daher in der fokussierten Analyse der Konzepte, die in den momentanen F2F Overlays zum Einsatz kommen. Im Zuge dieser Arbeit erweitern wir zunächst die existierenden Evaluationsmethoden entscheidend und erarbeiten so Methoden, die Grundlagen für unsere sowie zukünftige Forschung in diesem Bereich bilden. Basierend auf diesen neuen Evaluationsmethoden zeigen wir, dass die existierenden Ansätze grundlegend nicht fähig sind, akzeptable Antwortzeiten bereitzustellen ohne im Zuge dessen enorme Instandhaltungskosten oder Anfälligkeiten gegen Angriffe in Kauf zu nehmen. Folglich besteht unser zweiter Beitrag in der Entwicklung und Evaluierung eines neuen Konzeptes für F2F Overlays, basierenden auf den Erkenntnissen der vorangehenden Analyse. Insbesondere ergab sich in der vorangehenden Evaluation, dass Greedy Embeddings hoch-effiziente Kommunikation erlauben indem sie Teilnehmer durch Koordinaten adressieren und diese an die Struktur des Overlays anpassen. Jedoch sind Greedy Embeddings in ihrer ursprünglichen Form nicht auf anonyme Kommunikation mit einer dynamischen Teilnehmermengen und potentiellen Angreifern ausgelegt. Daher präsentieren wir ein Privätssphäre-schützenden Kommunikationsprotokoll für F2F Overlays, in dem die identifizierenden Koordinaten durch anonyme Adressen ersetzt werden. Des weiteren erhöhen wir die Resistenz der Kommunikation durch den Einsatz mehrerer Embeddings und alternativer Algorithmen zum Finden von Routen. Wir beweisen, dass unser Ansatz eine geringe Kommunikationskomplexität im Bezug auf die eigentliche Kommunikation sowie die Instandhaltung des Embeddings aufweist. Ferner zeigt unsere Simulationstudie, dass der Ansatz effiziente Kommunikation mit kurzen Antwortszeiten und geringer Instandhaltungskosten erreicht sowie den Einfluss von Ausfälle und Angriffe erfolgreich abschwächt. Unsere grundlegenden Ergebnisse eröffnen neue Möglichkeiten in der Entwicklung anonymer und zensurresistenter Anwendungen

    Dynamic data placement and discovery in wide-area networks

    Get PDF
    The workloads of online services and applications such as social networks, sensor data platforms and web search engines have become increasingly global and dynamic, setting new challenges to providing users with low latency access to data. To achieve this, these services typically leverage a multi-site wide-area networked infrastructure. Data access latency in such an infrastructure depends on the network paths between users and data, which is determined by the data placement and discovery strategies. Current strategies are static, which offer low latencies upon deployment but worse performance under a dynamic workload. We propose dynamic data placement and discovery strategies for wide-area networked infrastructures, which adapt to the data access workload. We achieve this with data activity correlation (DAC), an application-agnostic approach for determining the correlations between data items based on access pattern similarities. By dynamically clustering data according to DAC, network traffic in clusters is kept local. We utilise DAC as a key component in reducing access latencies for two application scenarios, emphasising different aspects of the problem: The first scenario assumes the fixed placement of data at sites, and thus focusses on data discovery. This is the case for a global sensor discovery platform, which aims to provide low latency discovery of sensor metadata. We present a self-organising hierarchical infrastructure consisting of multiple DAC clusters, maintained with an online and distributed split-and-merge algorithm. This reduces the number of sites visited, and thus latency, during discovery for a variety of workloads. The second scenario focusses on data placement. This is the case for global online services that leverage a multi-data centre deployment to provide users with low latency access to data. We present a geo-dynamic partitioning middleware, which maintains DAC clusters with an online elastic partition algorithm. It supports the geo-aware placement of partitions across data centres according to the workload. This provides globally distributed users with low latency access to data for static and dynamic workloads.Open Acces

    Design of efficient and elastic storage in the cloud

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore