18 research outputs found

    Content Distribution in P2P Systems

    Get PDF
    The report provides a literature review of the state-of-the-art for content distribution. The report's contributions are of threefold. First, it gives more insight into traditional Content Distribution Networks (CDN), their requirements and open issues. Second, it discusses Peer-to-Peer (P2P) systems as a cheap and scalable alternative for CDN and extracts their design challenges. Finally, it evaluates the existing P2P systems dedicated for content distribution according to the identied requirements and challenges

    Content Distribution in P2P Systems

    Get PDF
    The report provides a literature review of the state-of-the-art for content distribution. The report's contributions are of threefold. First, it gives more insight into traditional Content Distribution Networks (CDN), their requirements and open issues. Second, it discusses Peer-to-Peer (P2P) systems as a cheap and scalable alternative for CDN and extracts their design challenges. Finally, it evaluates the existing P2P systems dedicated for content distribution according to the identied requirements and challenges

    Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters

    Get PDF
    Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm

    SoS: self-organizing substrates

    Get PDF
    Large-scale networked systems often, both by design or chance exhibit self-organizing properties. Understanding self-organization using tools from cybernetics, particularly modeling them as Markov processes is a first step towards a formal framework which can be used in (decentralized) systems research and design.Interesting aspects to look for include the time evolution of a system and to investigate if and when a system converges to some absorbing states or stabilizes into a dynamic (and stable) equilibrium and how it performs under such an equilibrium state. Such a formal framework brings in objectivity in systems research, helping discern facts from artefacts as well as providing tools for quantitative evaluation of such systems. This thesis introduces such formalism in analyzing and evaluating peer-to-peer (P2P) systems in order to better understand the dynamics of such systems which in turn helps in better designs. In particular this thesis develops and studies the fundamental building blocks for a P2P storage system. In the process the design and evaluation methodology we pursue illustrate the typical methodological approaches in studying and designing self-organizing systems, and how the analysis methodology influences the design of the algorithms themselves to meet system design goals (preferably with quantifiable guarantees). These goals include efficiency, availability and durability, load-balance, high fault-tolerance and self-maintenance even in adversarial conditions like arbitrarily skewed and dynamic load and high membership dynamics (churn), apart of-course the specific functionalities that the system is supposed to provide. The functionalities we study here are some of the fundamental building blocks for various P2P applications and systems including P2P storage systems, and hence we call them substrates or base infrastructure. These elemental functionalities include: (i) Reliable and efficient discovery of resources distributed over the network in a decentralized manner; (ii) Communication among participants in an address independent manner, i.e., even when peers change their physical addresses; (iii) Availability and persistence of stored objects in the network, irrespective of availability or departure of individual participants from the system at any time; and (iv) Freshness of the objects/resources' (up-to-date replicas). Internet-scale distributed index structures (often termed as structured overlays) are used for discovery and access of resources in a decentralized setting. We propose a rapid construction from scratch and maintenance of the P-Grid overlay network in a self-organized manner so as to provide efficient search of both individual keys as well as a whole range of keys, doing so providing good load-balancing characteristics for diverse kind of arbitrarily skewed loads - storage and replication, query forwarding and query answering loads. For fast overlay construction we employ recursive partitioning of the key-space so that the resulting partitions are balanced with respect to storage load and replication. The proper algorithmic parameters for such partitioning is derived from a transient analysis of the partitioning process which has Markov property. Preservation of ordering information in P-Grid such that queries other than exact queries, like range queries can be efficiently and rather trivially handled makes P-Grid suitable for data-oriented applications. Fast overlay construction is analogous to building an index on a new set of keys making P-Grid suitable as the underlying indexing mechanism for peer-to-peer information retrieval applications among other potential applications which may require frequent indexing of new attributes apart regular updates to an existing index. In order to deal with membership dynamics, in particular changing physical address of peers across sessions, the overlay itself is used as a (self-referential) directory service for maintaining the participating peers' physical addresses across sessions. Exploiting this self-referential directory, a family of overlay maintenance scheme has been designed with lower communication overhead than other overlay maintenance strategies. The notion of dynamic equilibrium study for overlays under continuous churn and repairs, modeled as a Markov process, was introduced in order to evaluate and compare the overlay maintenance schemes. While the self-referential directory was originally invented to realize overlay maintenance schemes with lower overheads than existing overlay maintenance schemes, the self-referential directory is generic in nature and can be used for various other purposes, e.g., as a decentralized public key infrastructure. Persistence of peer identity across sessions, in spite of changes in physical address, provides a logical independence of the overlay network from the underlying physical network. This has many other potential usages, for example, efficient maintenance mechanisms for P2P storage systems and P2P trust and reputation management. We specifically look into the dynamics of maintaining redundancy for storage systems and design a novel lazy maintenance strategy. This strategy is algorithmically a simple variant of existing maintenance strategies which adapts to the system dynamics. This randomized lazy maintenance strategy thus explores the cost-performance trade-offs of the storage maintenance operations in a self-organizing manner. We model the storage system (redundancy), under churn and maintenance, as a Markov process. We perform an equilibrium study to show that the system operates in a more stable dynamic equilibrium with our strategy than for the existing maintenance scheme for comparable overheads. Particularly, we show that our maintenance scheme provides substantial performance gains in terms of maintenance overhead and system's resilience in presence of churn and correlated failures. Finally, we propose a gossip mechanism which works with lower communication overhead than existing approaches for communication among a relatively large set of unreliable peers without assuming any specific structure for their mutual connectivity. We use such a communication primitive for propagating replica updates in P2P systems, facilitating management of mutable content in P2P systems. The peer population affected by a gossip can be modeled as a Markov process. Studying the transient spread of gossips help in choosing proper algorithm parameters to reduce communication overhead while guaranteeing coverage of online peers. Each of these substrates in themselves were developed to find practical solutions for real problems. Put together, these can be used in other applications, including a P2P storage system with support for efficient lookup and inserts, membership dynamics, content mutation and updates, persistence and availability. Many of the ideas have already been implemented in real systems and several others are in the way to be integrated into the implementations. There are two principal contributions of this dissertation. It provides design of the P2P systems which are useful for end-users as well as other application developers who can build upon these existing systems. Secondly, it adapts and introduces the methodology of analysis of a system's time-evolution (tools typically used in diverse domains including physics and cybernetics) to study the long run behavior of P2P systems, and uses this methodology to (re-)design appropriate algorithms and evaluate them. We observed that studying P2P systems from the perspective of complex systems reveals their inner dynamics and hence ways to exploit such dynamics for suitable or better algorithms. In other words, the analysis methodology in itself strongly influences and inspires the way we design such systems. We believe that such an approach of orchestrating self-organization in internet-scale systems, where the algorithms and the analysis methodology have strong mutual influence will significantly change the way future such systems are developed and evaluated. We envision that such an approach will particularly serve as an important tool for the nascent but fast moving P2P systems research and development community

    A schema-based peer-to-peer infrastructure for digital library networks

    Get PDF
    [no abstract

    Supporting Non-Linear and Non-Continuous Media Access in Peer-to-Peer Multimedia Systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Agent organization in the KP

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 181-191).In designing and building a network like the Internet, we continue to face the problems of scale and distribution. With the dramatic expansion in scale and heterogeneity of the Internet, network management has become an increasingly difficult task. Furthermore, network applications often need to maintain efficient organization among the participants by collecting information from the underlying networks. Such individual information collection activities lead to duplicate efforts and contention for network resources. The Knowledge Plane (KP) is a new common construct that provides knowledge and expertise to meet the functional, policy and scaling requirements of network management, as well as to create synergy and exploit commonality among many network applications. To achieve these goals, we face many challenging problems, including widely distributed data collection, efficient processing of that data, wide availability of the expertise, etc. In this thesis, to provide better support for network management and large-scale network applications, I propose a knowledge plane architecture that consists of a network knowledge plane (NetKP) at the network layer, and on top of it, multiple specialized KPs (spec-KPs). The NetKP organizes agents to provide valuable knowledge and facilities about the Internet to the spec-KPs. Each spec-KP is specialized in its own area of interest. In both the NetKP and the spec-KPs, agents are organized into regions based on different sets of constraints. I focus on two key design issues in the NetKP: (1) a region-based architecture for agent organization, in which I design an efficient and non-intrusive organization among regions that combines network topology and a distributed hash table; (2) request and knowledge dissemination, in which I design a robust and efficient broadcast and aggregation mechanism using a tree structure among regions.(cont.) In the spec-KPs, I build two examples: experiment management on the PlanetLab testbed and distributed intrusion detection on the DETER testbed. The experiment results suggest a common approach driven by the design principles of the Internet and more specialized constraints can derive productive organization for network management and applications.by Ji Li.Ph.D

    Application of overlay techniques to network monitoring

    Get PDF
    Measurement and monitoring are important for correct and efficient operation of a network, since these activities provide reliable information and accurate analysis for characterizing and troubleshooting a network’s performance. The focus of network measurement is to measure the volume and types of traffic on a particular network and to record the raw measurement results. The focus of network monitoring is to initiate measurement tasks, collect raw measurement results, and report aggregated outcomes. Network systems are continuously evolving: besides incremental change to accommodate new devices, more drastic changes occur to accommodate new applications, such as overlay-based content delivery networks. As a consequence, a network can experience significant increases in size and significant levels of long-range, coordinated, distributed activity; furthermore, heterogeneous network technologies, services and applications coexist and interact. Reliance upon traditional, point-to-point, ad hoc measurements to manage such networks is becoming increasingly tenuous. In particular, correlated, simultaneous 1-way measurements are needed, as is the ability to access measurement information stored throughout the network of interest. To address these new challenges, this dissertation proposes OverMon, a new paradigm for edge-to-edge network monitoring systems through the application of overlay techniques. Of particular interest, the problem of significant network overheads caused by normal overlay network techniques has been addressed by constructing overlay networks with topology awareness - the network topology information is derived from interior gateway protocol (IGP) traffic, i.e. OSPF traffic, thus eliminating all overlay maintenance network overhead. Through a prototype that uses overlays to initiate measurement tasks and to retrieve measurement results, systematic evaluation has been conducted to demonstrate the feasibility and functionality of OverMon. The measurement results show that OverMon achieves good performance in scalability, flexibility and extensibility, which are important in addressing the new challenges arising from network system evolution. This work, therefore, contributes an innovative approach of applying overly techniques to solve realistic network monitoring problems, and provides valuable first hand experience in building and evaluating such a distributed system

    Agent Organization in the Knowledge Plane

    Get PDF
    In designing and building a network like the Internet, we continue to face the problems of scale and distribution. With the dramatic expansion in scale and heterogeneity of the Internet, network management has become an increasingly difficult task. Furthermore, network applications often need to maintain efficient organization among the participants by collecting information from the underlying networks. Such individual information collection activities lead to duplicate efforts and contention for network resources.The Knowledge Plane (KP) is a new common construct that provides knowledge and expertise to meet the functional, policy and scaling requirements of network management, as well as to create synergy and exploit commonality among many network applications. To achieve these goals, we face many challenging problems, including widely distributed data collection, efficient processing of that data, wide availability of the expertise, etc.In this thesis, to provide better support for network management and large-scale network applications, I propose a knowledge plane architecture that consists of a network knowledge plane (NetKP) at the network layer, and on top of it, multiple specialized KPs (spec-KPs). The NetKP organizes agents to provide valuable knowledge and facilities about the Internet to the spec-KPs. Each spec-KP is specialized in its own area of interest. In both the NetKP and the spec-KPs, agents are organized into regions based on different sets of constraints. I focus on two key design issues in the NetKP: (1) a regionbased architecture for agent organization, in which I design an efficient and non-intrusive organization among regions that combines network topology and a distributed hash table; (2) request and knowledge dissemination, in which I design a robust and efficient broadcast and aggregation mechanism using a tree structure among regions. In the spec-KPs, I build two examples: experiment management on the PlanetLab testbed and distributed intrusion detection on the DETER testbed. The experiment results suggest a common approach driven by the design principles of the Internet and more specialized constraints can derive productive organization for network management and applications

    Efficient and Flexible Search in Large Scale Distributed Systems

    Get PDF
    Peer-to-peer (P2P) technology has triggered a wide range of distributed systems beyond simple file-sharing. Distributed XML databases, distributed computing, server-less web publishing and networked resource/service sharing are only a few to name. Despite of the diversity in applications, these systems share a common problem regarding searching and discovery of information. This commonality stems from the transitory nodes population and volatile information content in the participating nodes. In such dynamic environment, users are not expected to have the exact information about the available objects in the system. Rather queries are based on partial information, which requires the search mechanism to be flexible. On the other hand, to scale with network size the search mechanism is required to be bandwidth efficient. Since the advent of P2P technology experts from industry and academia have proposed a number of search techniques - none of which is able to provide satisfactory solution to the conflicting requirements of search efficiency and flexibility. Structured search techniques, mostly Distributed Hash Table (DHT)-based, are bandwidth efficient while semi(un)-structured techniques are flexible. But, neither achieves both ends. This thesis defines the Distributed Pattern Matching (DPM) problem. The DPM problem is to discover a pattern (\ie bit-vector) using any subset of its 1-bits, under the assumption that the patterns are distributed across a large population of networked nodes. Search problem in many distributed systems can be reduced to the DPM problem. This thesis also presents two distinct search mechanisms, named Distributed Pattern Matching System (DPMS) and Plexus, for solving the DPM problem. DPMS is a semi-structured, hierarchical architecture aiming to discover a predefined number of matches by visiting a small number of nodes. Plexus, on the other hand, is a structured search mechanism based on the theory of Error Correcting Code (ECC). The design goal behind Plexus is to discover all the matches by visiting a reasonable number of nodes
    corecore