4,804 research outputs found
Simple Load Balancing for Distributed Hash Tables
Distributed hash tables have recently become a useful building block for a variety of distributed applications. However, current schemes based upon consistent hashing require both considerable implementation complexity and substantial storage overhead to achieve desired load balancing goals. We argue in this paper that these goals can b e achieved more simply and more cost-effectively. First, we suggest the direct application of the "power of two choices" paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally b e extended to support other load balancing methods, including load-stealing or load-shedding schemes, as well as providing natural fault-tolerance mechanisms
Extrema propagation: fast distributed estimation of sums and network sizes
Aggregation of data values plays an important role on distributed computations, in particular, over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or estimates of the average system load to direct load balancing. Distributed aggregation using nonidempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multipath routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps can be made just slightly above the theoretical minimum; and it is fully distributed, with no single point of failure and the result produced at every node
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
Optimally Efficient Prefix Search and Multicast in Structured P2P Networks
Searching in P2P networks is fundamental to all overlay networks.
P2P networks based on Distributed Hash Tables (DHT) are optimized for single
key lookups, whereas unstructured networks offer more complex queries at the
cost of increased traffic and uncertain success rates. Our Distributed Tree
Construction (DTC) approach enables structured P2P networks to perform prefix
search, range queries, and multicast in an optimal way. It achieves this by
creating a spanning tree over the peers in the search area, using only
information available locally on each peer. Because DTC creates a spanning
tree, it can query all the peers in the search area with a minimal number of
messages. Furthermore, we show that the tree depth has the same upper bound as
a regular DHT lookup which in turn guarantees fast and responsive runtime
behavior. By placing objects with a region quadtree, we can perform a prefix
search or a range query in a freely selectable area of the DHT. Our DTC
algorithm is DHT-agnostic and works with most existing DHTs. We evaluate the
performance of DTC over several DHTs by comparing the performance to existing
application-level multicast solutions, we show that DTC sends 30-250% fewer
messages than common solutions
A Taxonomy of Self-configuring Service Discovery Systems
We analyze the fundamental concepts and issues in service
discovery. This analysis places service discovery in the context of distributed
systems by describing service discovery as a third generation
naming system. We also describe the essential architectures and the
functionalities in service discovery. We then proceed to show how service
discovery fits into a system, by characterizing operational aspects.
Subsequently, we describe how existing state of the art performs service
discovery, in relation to the operational aspects and functionalities, and
identify areas for improvement
- …