301 research outputs found

    Structured Overlay For Heterogeneous Environments: Design and Evaluation of Oscar

    Get PDF
    Recent years have seen advances in building large internet-scale index structures, generally known as structured overlays. Early structured overlays realized distributed hash tables (DHTs) which are ill suited for anything but exact queries. The need to support range queries necessitate systems which can handle uneven load distributions. However such systems suffer from practical problems - including poor latency, disproportionate bandwidth usage at participating peers or unrealistic assumptions on peers' homogeneity, in terms of available storage or bandwidth resources. In this paper we consider a system which is capable not only to support uneven load distributions but also to operate in heterogeneous environments, where each peer can autonomously decide how much of its resources to contribute to the system. We provide the theoretical foundations of realizing such a network and present a newly proposed system Oscar based on these principles. Oscar can construct efficient overlays given arbitrary load distributions by employing a novel scalable network sampling technique. The simulations of our system validate the theory and evaluate Oscar's performance under typical challenges encountered in real-life large-scale networked systems, including participant heterogeneity, faults and skewed and dynamic load-distributions. Thus the Oscar distributed index fills in an important gap in the family of structured overlays, bringing into life a practical internet-scale index, which can play a crucial role in enabling data-oriented applications distributed over wide-area networks

    Designing peer-to-peer overlays:a small-world perspective

    Get PDF
    The Small-World phenomenon, well known under the phrase "six degrees of separation", has been for a long time under the spotlight of investigation. The fact that our social network is closely-knitted and that any two people are linked by a short chain of acquaintances was confirmed by the experimental psychologist Stanley Milgram in the sixties. However, it was only after the seminal work of Jon Kleinberg in 2000 that it was understood not only why such networks exist, but also why it is possible to efficiently navigate in these networks. This proved to be a highly relevant discovery for peer-to-peer systems, since they share many fundamental similarities with the social networks; in particular the fact that the peer-to-peer routing solely relies on local decisions, without the possibility to invoke global knowledge. In this thesis we show how peer-to-peer system designs that are inspired by Small-World principles can address and solve many important problems, such as balancing the peer load, reducing high maintenance cost, or efficiently disseminating data in large-scale systems. We present three peer-to-peer approaches, namely Oscar, Gravity, and Fuzzynet, whose concepts stem from the design of navigable Small-World networks. Firstly, we introduce a novel theoretical model for building peer-to-peer systems which supports skewed node distributions and still preserves all desired properties of Kleinberg's Small-World networks. With such a model we set a reference base for the design of data-oriented peer-to-peer systems which are characterized by non-uniform distribution of keys as well as skewed query or access patterns. Based on this theoretical model we introduce Oscar, an overlay which uses a novel scalable network sampling technique for network construction, for which we provide a rigorous theoretical analysis. The simulations of our system validate the developed theory and evaluate Oscar's performance under typical conditions encountered in real-life large-scale networked systems, including participant heterogeneity, faults, as well as skewed and dynamic load-distributions. Furthermore, we show how by utilizing Small-World properties it is possible to reduce the maintenance cost of most structured overlays by discarding a core network connectivity element – the ring invariant. We argue that reliance on the ring structure is a serious impediment for real life deployment and scalability of structured overlays. We propose an overlay called Fuzzynet, which does not rely on the ring invariant, yet has all the functionalities of structured overlays. Fuzzynet takes the idea of lazy overlay maintenance further by eliminating the need for any explicit connectivity and data maintenance operations, relying merely on the actions performed when new Fuzzynet peers join the network. We show that with a sufficient amount of neighbors, even under high churn, data can be retrieved in Fuzzynet with high probability. Finally, we show how peer-to-peer systems based on the Small-World design and with the capability of supporting non-uniform key distributions can be successfully employed for large-scale data dissemination tasks. We introduce Gravity, a publish/subscribe system capable of building efficient dissemination structures, inducing only minimal dissemination relay overhead. This is achieved through Gravity's property to permit non-uniform peer key distributions which allows the subscribers to be clustered close to each other in the key space where data dissemination is cheap. An extensive experimental study confirms the effectiveness of our system under realistic subscription patterns and shows that Gravity surpasses existing approaches in efficiency by a large margin. With the peer-to-peer systems presented in this thesis we fill an important gap in the family of structured overlays, bringing into life practical systems, which can play a crucial role in enabling data-oriented applications distributed over wide-area networks

    Optimising Structured P2P Networks for Complex Queries

    Get PDF
    With network enabled consumer devices becoming increasingly popular, the number of connected devices and available services is growing considerably - with the number of connected devices es- timated to surpass 15 billion devices by 2015. In this increasingly large and dynamic environment it is important that users have a comprehensive, yet efficient, mechanism to discover services. Many existing wide-area service discovery mechanisms are centralised and do not scale to large numbers of users. Additionally, centralised services suffer from issues such as a single point of failure, high maintenance costs, and difficulty of management. As such, this Thesis seeks a Peer to Peer (P2P) approach. Distributed Hash Tables (DHTs) are well known for their high scalability, financially low barrier of entry, and ability to self manage. They can be used to provide not just a platform on which peers can offer and consume services, but also as a means for users to discover such services. Traditionally DHTs provide a distributed key-value store, with no search functionality. In recent years many P2P systems have been proposed providing support for a sub-set of complex query types, such as keyword search, range queries, and semantic search. This Thesis presents a novel algorithm for performing any type of complex query, from keyword search, to complex regular expressions, to full-text search, over any structured P2P overlay. This is achieved by efficiently broadcasting the search query, allowing each peer to process the query locally, and then efficiently routing responses back to the originating peer. Through experimentation, this technique is shown to be successful when the network is stable, however performance degrades under high levels of network churn. To address the issue of network churn, this Thesis proposes a number of enhancements which can be made to existing P2P overlays in order to improve the performance of both the existing DHT and the proposed algorithm. Through two case studies these enhancements are shown to improve not only the performance of the proposed algorithm under churn, but also the performance of traditional lookup operations in these networks

    Efficient algorithms for new computational models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 155-163).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Advances in hardware design and manufacturing often lead to new ways in which problems can be solved computationally. In this thesis we explore fundamental problems in three computational models that are based on such recent advances. The first model is based on new chip architectures, where multiple independent processing units are placed on one chip, allowing for an unprecedented parallelism in hardware. We provide new scheduling algorithms for this computational model. The second model is motivated by peer-to-peer networks, where countless (often inexpensive) computing devices cooperate in distributed applications without any central control. We state and analyze new algorithms for load balancing and for locality-aware distributed data storage in peer-to-peer networks. The last model is based on extensions of the streaming model. It is an attempt to capture the class of problems that can be efficiently solved on massive data sets. We give a number of algorithms for this model, and compare it to other models that have been proposed for massive data set computations. Our algorithms and complexity results for these computational models follow the central thesis that it is an important part of theoretical computer science to model real-world computational structures, and that such effort is richly rewarded by a plethora of interesting and challenging problems.by Jan Matthias Ruhl.Ph.D

    Distributed complex event recognition

    Get PDF
    Complex Event Recognition (CER) has emerged as a prominent technology for detecting situations of interest, in the form of query patterns, over large streams of data in real-time. Thus, having query evaluation mechanisms that minimize latency is a shared desiderata. Nonetheless, the evaluation of CER queries is well known to be computationally expensive. Indeed, such evaluation requires maintaining a set of partial matches which grows super-linearly in the number of processed events. While most prominent solutions for CER run in a centralized setting, this has proved inefficient for Big Data requirements, where it is necessary to scale the system to cope with an increasing arrival rate of events while maintaining a stable throughput. To overcome these issues, we propose a novel distributed CER system that focuses on the efficient evaluation of a large class of complex event queries, including n-ary predicates, time windows, and partition-by event correlation operator. This system uses a state-of-the-art automaton-based distributed algorithm that circumvents the super-linear partial match problem. Moreover, in the presence of heavy workloads, the system can scale-out by increasing the number of processing units with little overhead. We additionally provide a proof of correctness of the algorithm. We experimentally compare our system against the state-of-the-art sequential CER engine that inspired our work and show that our system outperform its predecessor in the presence of queries with complex predicates. Furthermore, we show that, in the presence of Big Data requirements, our system performance is overall better

    EFFICIENT AND SCALABLE NETWORK SECURITY PROTOCOLS BASED ON LFSR SEQUENCES

    Get PDF
    The gap between abstract, mathematics-oriented research in cryptography and the engineering approach of designing practical, network security protocols is widening. Network researchers experiment with well-known cryptographic protocols suitable for different network models. On the other hand, researchers inclined toward theory often design cryptographic schemes without considering the practical network constraints. The goal of this dissertation is to address problems in these two challenging areas: building bridges between practical network security protocols and theoretical cryptography. This dissertation presents techniques for building performance sensitive security protocols, using primitives from linear feedback register sequences (LFSR) sequences, for a variety of challenging networking applications. The significant contributions of this thesis are: 1. A common problem faced by large-scale multicast applications, like real-time news feeds, is collecting authenticated feedback from the intended recipients. We design an efficient, scalable, and fault-tolerant technique for combining multiple signed acknowledgments into a single compact one and observe that most signatures (based on the discrete logarithm problem) used in previous protocols do not result in a scalable solution to the problem. 2. We propose a technique to authenticate on-demand source routing protocols in resource-constrained wireless mobile ad-hoc networks. We develop a single-round multisignature that requires no prior cooperation among nodes to construct the multisignature and supports authentication of cached routes. 3. We propose an efficient and scalable aggregate signature, tailored for applications like building efficient certificate chains, authenticating distributed and adaptive content management systems and securing path-vector routing protocols. 4. We observe that blind signatures could form critical building blocks of privacypreserving accountability systems, where an authority needs to vouch for the legitimacy of a message but the ownership of the message should be kept secret from the authority. We propose an efficient blind signature that can serve as a protocol building block for performance sensitive, accountability systems. All special forms digital signatures—aggregate, multi-, and blind signatures—proposed in this dissertation are the first to be constructed using LFSR sequences. Our detailed cost analysis shows that for a desired level of security, the proposed signatures outperformed existing protocols in computation cost, number of communication rounds and storage overhead

    Towards Practical Privacy-Preserving Protocols

    Get PDF
    Protecting users' privacy in digital systems becomes more complex and challenging over time, as the amount of stored and exchanged data grows steadily and systems become increasingly involved and connected. Two techniques that try to approach this issue are Secure Multi-Party Computation (MPC) and Private Information Retrieval (PIR), which aim to enable practical computation while simultaneously keeping sensitive data private. In this thesis we present results showing how real-world applications can be executed in a privacy-preserving way. This is not only desired by users of such applications, but since 2018 also based on a strong legal foundation with the General Data Protection Regulation (GDPR) in the European Union, that forces companies to protect the privacy of user data by design. This thesis' contributions are split into three parts and can be summarized as follows: MPC Tools Generic MPC requires in-depth background knowledge about a complex research field. To approach this, we provide tools that are efficient and usable at the same time, and serve as a foundation for follow-up work as they allow cryptographers, researchers and developers to implement, test and deploy MPC applications. We provide an implementation framework that abstracts from the underlying protocols, optimized building blocks generated from hardware synthesis tools, and allow the direct processing of Hardware Definition Languages (HDLs). Finally, we present an automated compiler for efficient hybrid protocols from ANSI C. MPC Applications MPC was for a long time deemed too expensive to be used in practice. We show several use cases of real-world applications that can operate in a privacy-preserving, yet practical way when engineered properly and built on top of suitable MPC protocols. Use cases presented in this thesis are from the domain of route computation using BGP on the Internet or at Internet Exchange Points (IXPs). In both cases our protocols protect sensitive business information that is used to determine routing decisions. Another use case focuses on genomics, which is particularly critical as the human genome is connected to everyone during their entire lifespan and cannot be altered. Our system enables federated genomic databases, where several institutions can privately outsource their genome data and where research institutes can query this data in a privacy-preserving manner. PIR and Applications Privately retrieving data from a database is a crucial requirement for user privacy and metadata protection, and is enabled amongst others by a technique called Private Information Retrieval (PIR). We present improvements and a generalization of a well-known multi-server PIR scheme of Chor et al., and an implementation and evaluation thereof. We also design and implement an efficient anonymous messaging system built on top of PIR. Furthermore we provide a scalable solution for private contact discovery that utilizes ideas from efficient two-server PIR built from Distributed Point Functions (DPFs) in combination with Private Set Intersection (PSI)

    Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks

    Get PDF
    Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There have been numerous efforts to improve the performance of the data analytics frameworks in- cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have various target workloads and their own characteristicshowever, there is common ground as a data analytics framework. Emerging hardware such as graphics processing units and persistent memory is expected to open up new opportunities for such commonality. The goal of this dis- sertation is to leverage emerging hardware to improve the performance of the data analytics frameworks. First, we design and implement EclipseMR, a novel MapReduce framework that efficiently leverages an extensive amount of memory space distributed among the machines in a cluster. EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer. The in-memory cache layer is designed to store both local and remote data while balancing the load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a primary storage layer or cache layer, or it can adopt GPU to improve the performance of map and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for various applications. Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis- tent memory. The fundamental challenge to design a data structure for the persistent memory is to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost. B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits from the strength of both in-memory index and block-based index, and CCEH is a variant of extendible hashing that introduces an intermediate layer between directory and buckets to fully benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data structures show better performance than the corresponding state-of-the-art techniques. Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed to improve the performance of the query by leveraging GPU as an acceleratorhowever, most of the works focus on the brute-force algorithms. In this work, we overcome the challenges of traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared memory and runtime stack, irregular memory access pattern, and warp divergence problem. Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force algorithm and the traditional branch and bound algorithm.clos
    • 

    corecore