33 research outputs found

    EFFICIENT LOAD BALANCING IN PEER-TO-PEER SYSTEMS USING VIRTUAL SERVERS

    Get PDF
    Load balancing is a critical issue for the efficient operation of peer-to- peer networks. With the notion of virtual servers, peers participating in a heterogeneous, structured peer-to-peer (P2P) network may host different numbers of virtual servers, and by migrating virtual servers, peers can balance their loads proportional to their capacities. Peers participating in a Distributed Hash Table (DHT) are often heterogeneous. The existing and decentralized load balance algorithms designed for the heterogeneous, structured P2P networks either explicitly construct auxiliary networks to manipulate global information or implicitly demand the P2P substrates organized in a hierarchical fashion. Without relying on any auxiliary networks and independent of the geometry of the P2P substrates, this paper present ,a novel efficient, proximity-aware load balancing algorithm by using the concept of common virtual servers, that is unique in that each participating peer is based on the partial knowledge of the system to estimate the probability distributions of the capacities of peers and the loads of virtual servers. The movement cost can be reduced by using common virtual serve

    Distributed Load Balancing in Peer-to-Peer Computing

    Get PDF
    In this paper, we address the load balancing problem in the context of peer-to-peer computing environments. The key challenge to employ peer-to-peer networks for distributed computing is to exploit the heterogeneous processing capability of the participating hosts as well as the diverse network conditions. The contribution of our work is twofold. First, we model the load balance problem as an optimization problem with the objective of minimizing the system response time. This modeling considers not only the current loading of hosts, but also the fluctuation of network delay, which completely captures the characteristics of the P2P systems. Second, we propose a gradient projection algorithm to solve the optimization problem, which is fully distributed and easy for implementation. Simulation results demonstrate that our scheme has satisfied performance in terms of convergence, response time and load distribution

    When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Processing

    Full text link
    Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated to keys which are significantly more frequent than others. Skew is remarkably more problematic in large deployments: more workers implies fewer keys per worker, so it becomes harder to "average out" the cost of hot keys with cold keys. We propose a novel load balancing technique that uses a heaving hitter algorithm to efficiently identify the hottest keys in the stream. These hot keys are assigned to d≥2d \geq 2 choices to ensure a balanced load, where dd is tuned automatically to minimize the memory and computation cost of operator replication. The technique works online and does not require the use of routing tables. Our extensive evaluation shows that our technique can balance real-world workloads on large deployments, and improve throughput and latency by 150%\mathbf{150\%} and 60%\mathbf{60\%} respectively over the previous state-of-the-art when deployed on Apache Storm.Comment: 12 pages, 14 Figures, this paper is accepted and will be published at ICDE 201

    Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks

    Get PDF
    We study the problem of evaluating conjunctive queries com- posed of triple patterns over RDF data stored in distributed hash tables. Our goal is to develop algorithms that scale to large amounts of RDF data, distribute the query processing load evenly and incur little network traffic. We present and evaluate two novel query processing algorithms with these possibly conflicting goals in mind. We discuss the various tradeoffs that occur in our setting through a detailed experimental eval- uation of the proposed algorithms

    A SCALABLE SEARCH ENGINE AGGREGATOR

    Get PDF
    The ability to display different media sources in an appropriate way is an integral part of search engines such as Google, Yahoo, and Bing, as well as social networking sites like Facebook, etc. This project explores and implements various media-updating features of the open source search engine Yioop [1]. These include news aggregation, video conversion and email distribution. An older, preexisting news update feature of Yioop was modified and scaled so that it can work on many machines. We redesigned and modified the user interface associated with a distributed news updater feature in Yioop. This project also introduced a video updater feature for Yioop. This feature converts uploaded video into formats that are compatible with all browsers. It can quickly convert lengthy videos by splitting and converting them in a parallel fashion. It then merges them back into a single video. In this report, we discuss a solution to off- load the task of sending emails from the Yioop web application to the Yioop media updater by aggregating emails over a period of time. We conclude this report by describing experiments with these developed features on a cluster setup on an AWS platform

    Efficient Computation of Distance Sketches in Distributed Networks

    Full text link
    Distance computation is one of the most fundamental primitives used in communication networks. The cost of effectively and accurately computing pairwise network distances can become prohibitive in large-scale networks such as the Internet and Peer-to-Peer (P2P) networks. To negotiate the rising need for very efficient distance computation, approximation techniques for numerous variants of this question have recently received significant attention in the literature. The goal is to preprocess the graph and store a small amount of information such that whenever a query for any pairwise distance is issued, the distance can be well approximated (i.e., with small stretch) very quickly in an online fashion. Specifically, the pre-processing (usually) involves storing a small sketch with each node, such that at query time only the sketches of the concerned nodes need to be looked up to compute the approximate distance. In this paper, we present the first theoretical study of distance sketches derived from distance oracles in a distributed network. We first present a fast distributed algorithm for computing approximate distance sketches, based on a distributed implementation of the distance oracle scheme of [Thorup-Zwick, JACM 2005]. We also show how to modify this basic construction to achieve different tradeoffs between the number of pairs for which the distance estimate is accurate and other parameters. These tradeoffs can then be combined to give an efficient construction of small sketches with provable average-case as well as worst-case performance. Our algorithms use only small-sized messages and hence are suitable for bandwidth-constrained networks, and can be used in various networking applications such as topology discovery and construction, token management, load balancing, monitoring overlays, and several other problems in distributed algorithms.Comment: 18 page

    Content-based addressing in hierarchical distributed hash tables

    Get PDF
    Peer-to-peer networks have drawn their strength from their ability to operate functionally without the use of a central agent. In recent years the development of the structured peer-to-peer network has further increased the distributed nature of p2p systems. These networks take advantage of an underlying distributed data structure, a common one is the distributed hash table (DHT). These peers use this structure to act as equals in a network, sharing the same responsibilities of maintaining and contributing. But herein lays the problem, not all peers are equal in terms of resources and power. And with no central agent to monitor and balance load , the heterogeneous nature of peers can cause many distribution or bottleneck issues on the network and peer levels. This is due to the way in which addresses are allocated in these DHTs. Often this function is carried out by a consistent hashing function. These functions although powerful in their simplicity and effectiveness are the stem of a crucial flaw. This flaw causes the random nature in which addresses are assigned both when considering peer identification and allocating resource ownership. This work proposes a solution to mitigate the random nature of address assignment in DHTs, leveraging two methodologies called hierarchical DHTs and content based addressing. Combining these methods would enable peers to work in cooperative groups of like interested peers in order to dynamically share the load between group members. Group formation and utilization relies on the actual resources a peer willingly shares and is able to contribute rather than a function of the random hash employed by traditional DHT p2p structures
    corecore