67 research outputs found

    Blockchain based Access Control for Enterprise Blockchain Applications

    Get PDF
    Access control is one of the fundamental security mechanisms of IT systems. Most existing access control schemes rely on a centralized party to manage and enforce access control policies. As blockchain technologies, especially permissioned networks, find more applicability beyond cryptocurrencies in enterprise solutions, it is expected that the security requirements will increase. Therefore, it is necessary to develop an access control system that works in a decentralized environment without compromising the unique features of a blockchain. A straightforward method to support access control is to deploy a firewall in front of the enterprise blockchain application. However, this approach does not take advantage of the desirable features of blockchain. In order to address these concerns, we propose a novel blockchain‐based access control scheme, which keeps the decentralization feature for access control–related operations. The newly proposed system also provides the capability to protect user\u27s privacy by leveraging ring signature. We implement a prototype of the scheme using Hyperledger Fabric and assess its performance to show that it is practical for real‐world applications

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

    Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks

    Get PDF
    Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There have been numerous efforts to improve the performance of the data analytics frameworks in- cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have various target workloads and their own characteristicshowever, there is common ground as a data analytics framework. Emerging hardware such as graphics processing units and persistent memory is expected to open up new opportunities for such commonality. The goal of this dis- sertation is to leverage emerging hardware to improve the performance of the data analytics frameworks. First, we design and implement EclipseMR, a novel MapReduce framework that efficiently leverages an extensive amount of memory space distributed among the machines in a cluster. EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer. The in-memory cache layer is designed to store both local and remote data while balancing the load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a primary storage layer or cache layer, or it can adopt GPU to improve the performance of map and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for various applications. Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis- tent memory. The fundamental challenge to design a data structure for the persistent memory is to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost. B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits from the strength of both in-memory index and block-based index, and CCEH is a variant of extendible hashing that introduces an intermediate layer between directory and buckets to fully benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data structures show better performance than the corresponding state-of-the-art techniques. Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed to improve the performance of the query by leveraging GPU as an acceleratorhowever, most of the works focus on the brute-force algorithms. In this work, we overcome the challenges of traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared memory and runtime stack, irregular memory access pattern, and warp divergence problem. Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force algorithm and the traditional branch and bound algorithm.clos

    Préserver la vie privée des individus grùce aux SystÚmes Personnels de Gestion des Données

    Get PDF
    Riding the wave of smart disclosure initiatives and new privacy-protection regulations, the Personal Cloud paradigm is emerging through a myriad of solutions offered to users to let them gather and manage their whole digital life. On the bright side, this opens the way to novel value-added services when crossing multiple sources of data of a given person or crossing the data of multiple people. Yet this paradigm shift towards user empowerment raises fundamental questions with regards to the appropriateness of the functionalities and the data management and protection techniques which are offered by existing solutions to laymen users. Our work addresses these questions on three levels. First, we review, compare and analyze personal cloud alternatives in terms of the functionalities they provide and the threat models they target. From this analysis, we derive a general set of functionality and security requirements that any Personal Data Management System (PDMS) should consider. We then identify the challenges of implementing such a PDMS and propose a preliminary design for an extensive and secure PDMS reference architecture satisfying the considered requirements. Second, we focus on personal computations for a specific hardware PDMS instance (i.e., secure token with mass storage of NAND Flash). In this context, we propose a scalable embedded full-text search engine to index large document collections and manage tag-based access control policies. Third, we address the problem of collective computations in a fully-distributed architecture of PDMSs. We discuss the system and security requirements and propose protocols to enable distributed query processing with strong security guarantees against an attacker mastering many colluding corrupted nodes.Surfant sur la vague des initiatives de divulgation restreinte de donnĂ©es et des nouvelles rĂ©glementations en matiĂšre de protection de la vie privĂ©e, le paradigme du Cloud Personnel Ă©merge Ă  travers une myriade de solutions proposĂ©es aux utilisateurs leur permettant de rassembler et de gĂ©rer l'ensemble de leur vie numĂ©rique. Du cĂŽtĂ© positif, cela ouvre la voie Ă  de nouveaux services Ă  valeur ajoutĂ©e lors du croisement de plusieurs sources de donnĂ©es d'un individu ou du croisement des donnĂ©es de plusieurs personnes. Cependant, ce changement de paradigme vers la responsabilisation de l'utilisateur soulĂšve des questions fondamentales quant Ă  l'adĂ©quation des fonctionnalitĂ©s et des techniques de gestion et de protection des donnĂ©es proposĂ©es par les solutions existantes aux utilisateurs lambda. Notre travail aborde ces questions Ă  trois niveaux. Tout d'abord, nous passons en revue, comparons et analysons les alternatives de cloud personnel au niveau des fonctionnalitĂ©s fournies et des modĂšles de menaces ciblĂ©s. De cette analyse, nous dĂ©duisons un ensemble gĂ©nĂ©ral d'exigences en matiĂšre de fonctionnalitĂ© et de sĂ©curitĂ© que tout systĂšme personnel de gestion des donnĂ©es (PDMS) devrait prendre en compte. Nous identifions ensuite les dĂ©fis liĂ©s Ă  la mise en Ɠuvre d'un tel PDMS et proposons une conception prĂ©liminaire pour une architecture PDMS Ă©tendue et sĂ©curisĂ©e de rĂ©fĂ©rence rĂ©pondant aux exigences considĂ©rĂ©es. Ensuite, nous nous concentrons sur les calculs personnels pour une instance matĂ©rielle spĂ©cifique du PDMS (Ă  savoir, un dispositif personnel sĂ©curisĂ© avec un stockage de masse de type NAND Flash). Dans ce contexte, nous proposons un moteur de recherche plein texte embarquĂ© et Ă©volutif pour indexer de grandes collections de documents et gĂ©rer des politiques de contrĂŽle d'accĂšs basĂ©es sur des Ă©tiquettes. TroisiĂšmement, nous abordons le problĂšme des calculs collectifs dans une architecture entiĂšrement distribuĂ©e de PDMS. Nous discutons des exigences d'architectures systĂšme et de sĂ©curitĂ© et proposons des protocoles pour permettre le traitement distribuĂ© des requĂȘtes avec de fortes garanties de sĂ©curitĂ© contre un attaquant maĂźtrisant de nombreux nƓuds corrompus

    Storing and managing data in a distributed hash table

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 83-90).Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction of large-scale, wide-area applications. For example, UsenetDHT is a new design for Usenet News developed in this thesis that uses a DHT to cooperatively deliver Usenet articles: the DHT allows a set of N hosts to share storage of Usenet articles, reducing their combined storage requirements by a factor of O(N). Usenet generates a continuous stream of writes that exceeds 1 Tbyte/day in volume, comprising over ten million writes. Supporting this and the associated read workload requires a DHT engineered for durability and efficiency. Recovering from network and machine failures efficiently poses a challenge for DHT replication maintenance algorithms that provide durability. To avoid losing the last replica, replica maintenance must create additional replicas when failures are detected. However, creating replicas after every failure stresses network and storage resources unnecessarily. Tracking the location of every replica of every object would allow a replica maintenance algorithm to create replicas only when necessary, but when storing terabytes of data, such tracking is difficult to perform accurately and efficiently. This thesis describes a new algorithm, Passing Tone, that maintains durability efficiently, in a completely decentralized manner, despite transient and permanent failures. Passing Tone nodes make replication decisions with just basic DHT routing state, without maintaining state about the number or location of extant replicas and without responding to every transient failure with a new replica. Passing Tone is implemented in a revised version of DHash, optimized for both disk and network performance.(cont.) A sample 12 node deployment of Passing Tone and UsenetDHT supports a partial Usenet feed of 2.5 Mbyte/s (processing over 80 Tbyte of data per year), while providing 30 Mbyte/s of read throughput, limited currently by disk seeks. This deployment is the first public DHT to store terabytes of data. These results indicate that DHT-based designs can successfully simplify the construction of large-scale, wide-area systems.by Emil Sit.Ph.D

    Ereignisbasierte Software-Architektur fĂŒr Verzögerungs- und Unterbrechungstolerante Netze

    Get PDF
    Continuous end-to-end connectivity is not available all the time, not even in wired networks. Delay- and Disruption-Tolerant Networking (DTN) allows devices to communicate even if there is no continuous path to the destination by replacing the end-to-end semantics with a hop-by-hop store-carry-and-forward approach. Since existing implementations of DTN software suffer from various limitations, this work presents the event-driven software architecture of IBR-DTN, a lean, lightweight, and extensible implementation of a networking stack for Delay- and Disruption-Tolerant Networking. In a comprehensive description of the architecture and the underlying design decisions, this work focuses on eliminating weaknesses of the Bundle Protocol (RFC 5050). One of these is the dependency on synchronized clocks. Thus, this work takes a closer look on that requirement and presents approaches to bypass that dependency for some cases. For scenarios which require synchronized clocks, an approach is presented to distribute time information which is used to adjust the individual clock of nodes. To compare the accuracy of time information provided by each node, this approach introduces a clock rating. Additionally, a self-aligning algorithm is used to automatically adjust the node's clock rating parameters according to the estimated accuracy of the node's clock. In an evaluation, the general portability of the bundle node software is proven by porting it to various systems. Further, a performance analysis compares the new implementation with existing software. To perform an evaluation of the time-synchronization algorithm, the ONE simulator is modified to provide individual clocks with randomized clock errors for every node. Additionally, a specialized testbed, called Hydra, is being developed to test the implementation of the time-synchronization approach in real software. Hydra instantiates virtualized nodes running a complete operating system and provides a way to test real software in large DTN scenarios. Both the simulation and the emulation in Hydra show that the algorithm for time-synchronization can provide an adequate accuracy depending on the inter-contact times.Eine kontinuierliche Ende-zu-Ende-KonnektivitĂ€t ist nicht immer verfĂŒgbar, nicht einmal in drahtgebundenen Netzen. Verzögerungs- und unterbrechungstolerante Kommunikation (DTN) ersetzt die Ende-zu-Ende-Semantik mit einem Hop-by-Hop Store-Carry-and-Forward Ansatz und erlaubt es so GerĂ€ten miteinander zu kommunizieren, auch wenn es keinen kontinuierlichen Pfad gibt. Da bestehende DTN Implementierungen unter verschiedenen EinschrĂ€nkungen leiden, stellt diese Arbeit die ereignisgesteuerte Software-Architektur von IBR-DTN, eine schlanke, leichte und erweiterbare Implementierung eines Netzwerk-Stacks fĂŒr Verzögerungs- und unterbrechungstolerante Netze vor. In einer umfassenden Beschreibung der Architektur und den zugrunde liegenden Design-Entscheidungen, konzentriert sich diese Arbeit auf die Beseitigung von SchwĂ€chen des Bundle Protocols (RFC 5050). Eine davon ist die AbhĂ€ngigkeit zu synchronisierten Uhren. Daher wirft diese Arbeit einen genaueren Blick auf diese Anforderung und prĂ€sentiert AnsĂ€tze, um diese AbhĂ€ngigkeit in einigen FĂ€llen zu umgehen. FĂŒr Szenarien die synchronisierte Uhren voraussetzen wird außerdem ein Ansatz vorgestellt, um die Uhren der einzelnen Knoten mit Hilfe von verteilten Zeitinformationen zu korrigieren. Um die Genauigkeit der Zeitinformationen von jedem Knoten vergleichen zu können, wird eine Bewertung der Uhren eingefĂŒhrt. ZusĂ€tzlich wird ein Algorithmus vorgestellt, der die Parameter der Bewertung in AbhĂ€ngigkeit von der ermittelten Genauigkeit der lokalen Uhr anpasst. In einer Evaluation wird die allgemeine PortabilitĂ€t der Software zu verschiedenen Systemen gezeigt. Ferner wird bei einer Performance-Analyse die neue Software mit existierenden Implementierungen verglichen. Um eine Evaluation des Zeitsynchronisationsalgorithmus durchzufĂŒhren, wird der ONE Simlator so angepasst, dass jeder Knoten eine individuelle Uhr mit zufĂ€lligem Fehler besitzt. Außerdem wird eine spezielle Testumgebung namens Hydra vorgestellt um eine echte Implementierung des Zeitsynchronisationsalgorithmus zu testen. Hydra instanziiert virtualisierte Knoten mit einem kompletten Betriebssystem und bietet die Möglichkeit echte Software in großen DTN Szenarien zu testen. Sowohl die Simulation als auch die Emulation in Hydra zeigen, dass der Algorithmus fĂŒr die Zeitsynchronisation eine ausreichende Genauigkeit in AbhĂ€ngigkeit von KontakthĂ€ufigkeit erreicht

    SoS: self-organizing substrates

    Get PDF
    Large-scale networked systems often, both by design or chance exhibit self-organizing properties. Understanding self-organization using tools from cybernetics, particularly modeling them as Markov processes is a first step towards a formal framework which can be used in (decentralized) systems research and design.Interesting aspects to look for include the time evolution of a system and to investigate if and when a system converges to some absorbing states or stabilizes into a dynamic (and stable) equilibrium and how it performs under such an equilibrium state. Such a formal framework brings in objectivity in systems research, helping discern facts from artefacts as well as providing tools for quantitative evaluation of such systems. This thesis introduces such formalism in analyzing and evaluating peer-to-peer (P2P) systems in order to better understand the dynamics of such systems which in turn helps in better designs. In particular this thesis develops and studies the fundamental building blocks for a P2P storage system. In the process the design and evaluation methodology we pursue illustrate the typical methodological approaches in studying and designing self-organizing systems, and how the analysis methodology influences the design of the algorithms themselves to meet system design goals (preferably with quantifiable guarantees). These goals include efficiency, availability and durability, load-balance, high fault-tolerance and self-maintenance even in adversarial conditions like arbitrarily skewed and dynamic load and high membership dynamics (churn), apart of-course the specific functionalities that the system is supposed to provide. The functionalities we study here are some of the fundamental building blocks for various P2P applications and systems including P2P storage systems, and hence we call them substrates or base infrastructure. These elemental functionalities include: (i) Reliable and efficient discovery of resources distributed over the network in a decentralized manner; (ii) Communication among participants in an address independent manner, i.e., even when peers change their physical addresses; (iii) Availability and persistence of stored objects in the network, irrespective of availability or departure of individual participants from the system at any time; and (iv) Freshness of the objects/resources' (up-to-date replicas). Internet-scale distributed index structures (often termed as structured overlays) are used for discovery and access of resources in a decentralized setting. We propose a rapid construction from scratch and maintenance of the P-Grid overlay network in a self-organized manner so as to provide efficient search of both individual keys as well as a whole range of keys, doing so providing good load-balancing characteristics for diverse kind of arbitrarily skewed loads - storage and replication, query forwarding and query answering loads. For fast overlay construction we employ recursive partitioning of the key-space so that the resulting partitions are balanced with respect to storage load and replication. The proper algorithmic parameters for such partitioning is derived from a transient analysis of the partitioning process which has Markov property. Preservation of ordering information in P-Grid such that queries other than exact queries, like range queries can be efficiently and rather trivially handled makes P-Grid suitable for data-oriented applications. Fast overlay construction is analogous to building an index on a new set of keys making P-Grid suitable as the underlying indexing mechanism for peer-to-peer information retrieval applications among other potential applications which may require frequent indexing of new attributes apart regular updates to an existing index. In order to deal with membership dynamics, in particular changing physical address of peers across sessions, the overlay itself is used as a (self-referential) directory service for maintaining the participating peers' physical addresses across sessions. Exploiting this self-referential directory, a family of overlay maintenance scheme has been designed with lower communication overhead than other overlay maintenance strategies. The notion of dynamic equilibrium study for overlays under continuous churn and repairs, modeled as a Markov process, was introduced in order to evaluate and compare the overlay maintenance schemes. While the self-referential directory was originally invented to realize overlay maintenance schemes with lower overheads than existing overlay maintenance schemes, the self-referential directory is generic in nature and can be used for various other purposes, e.g., as a decentralized public key infrastructure. Persistence of peer identity across sessions, in spite of changes in physical address, provides a logical independence of the overlay network from the underlying physical network. This has many other potential usages, for example, efficient maintenance mechanisms for P2P storage systems and P2P trust and reputation management. We specifically look into the dynamics of maintaining redundancy for storage systems and design a novel lazy maintenance strategy. This strategy is algorithmically a simple variant of existing maintenance strategies which adapts to the system dynamics. This randomized lazy maintenance strategy thus explores the cost-performance trade-offs of the storage maintenance operations in a self-organizing manner. We model the storage system (redundancy), under churn and maintenance, as a Markov process. We perform an equilibrium study to show that the system operates in a more stable dynamic equilibrium with our strategy than for the existing maintenance scheme for comparable overheads. Particularly, we show that our maintenance scheme provides substantial performance gains in terms of maintenance overhead and system's resilience in presence of churn and correlated failures. Finally, we propose a gossip mechanism which works with lower communication overhead than existing approaches for communication among a relatively large set of unreliable peers without assuming any specific structure for their mutual connectivity. We use such a communication primitive for propagating replica updates in P2P systems, facilitating management of mutable content in P2P systems. The peer population affected by a gossip can be modeled as a Markov process. Studying the transient spread of gossips help in choosing proper algorithm parameters to reduce communication overhead while guaranteeing coverage of online peers. Each of these substrates in themselves were developed to find practical solutions for real problems. Put together, these can be used in other applications, including a P2P storage system with support for efficient lookup and inserts, membership dynamics, content mutation and updates, persistence and availability. Many of the ideas have already been implemented in real systems and several others are in the way to be integrated into the implementations. There are two principal contributions of this dissertation. It provides design of the P2P systems which are useful for end-users as well as other application developers who can build upon these existing systems. Secondly, it adapts and introduces the methodology of analysis of a system's time-evolution (tools typically used in diverse domains including physics and cybernetics) to study the long run behavior of P2P systems, and uses this methodology to (re-)design appropriate algorithms and evaluate them. We observed that studying P2P systems from the perspective of complex systems reveals their inner dynamics and hence ways to exploit such dynamics for suitable or better algorithms. In other words, the analysis methodology in itself strongly influences and inspires the way we design such systems. We believe that such an approach of orchestrating self-organization in internet-scale systems, where the algorithms and the analysis methodology have strong mutual influence will significantly change the way future such systems are developed and evaluated. We envision that such an approach will particularly serve as an important tool for the nascent but fast moving P2P systems research and development community
    • 

    corecore