42,696 research outputs found
Small-world networks, distributed hash tables and the e-resource discovery problem
Resource discovery is one of the most important underpinning problems behind producing a scalable,
robust and efficient global infrastructure for e-Science. A number of approaches to the resource discovery
and management problem have been made in various computational grid environments and prototypes
over the last decade. Computational resources and services in modern grid and cloud environments can be
modelled as an overlay network superposed on the physical network structure of the Internet and World
Wide Web. We discuss some of the main approaches to resource discovery in the context of the general
properties of such an overlay network. We present some performance data and predicted properties based
on algorithmic approaches such as distributed hash table resource discovery and management. We describe
a prototype system and use its model to explore some of the known key graph aspects of the global
resource overlay network - including small-world and scale-free properties
DHTJoin: Processing Continuous Join Queries Using DHT Networks
International audienceContinuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incuring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic
Data sharing in DHT based P2P systems
International audienceThe evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the "extreme" characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases forproviding data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identies important future research trends in data management in P2P DHT systems
Enabling High Data Throughput in Desktop Grids Through Decentralized Data and Metadata Management: The BlobSeer Approach
International audienceWhereas traditional Desktop Grids rely on centralized servers for data management, some recent progress has been made to enable distributed, large in- put data, using to peer-to-peer (P2P) protocols and Content Distribution Networks (CDN). We make a step further and propose a generic, yet efficient data storage which enables the use of Desktop Grids for applications with high output data re- quirements, where the access grain and the access patterns may be random. Our solution builds on a blob management service enabling a large number of con- current clients to efficiently read/write and append huge data that are fragmented and distributed at a large scale. Scalability under heavy concurrency is achieved thanks to an original metadata scheme using a distributed segment tree built on top of a Distributed Hash Table (DHT). The proposed approach has been imple- mented and its benefits have successfully been demonstrated within our BlobSeer prototype on the Grid'5000 testbed
Emerge: Self-Emerging Data Release Using Cloud Data Storage
In the age of Big Data, advances in distributed technologies and cloud storage services provide highly efficient and cost-effective solutions to large scale data storage and management. Supporting self-emerging data using clouds is a challenging problem. While straight-forward centralized approaches provide a basic solution to the problem, unfortunately they are limited to a single point of trust. Supporting attack-resilient timed release of encrypted data stored in clouds requires new mechanisms for self emergence of data encryption keys that enables encrypted data to become accessible at a future point in time. Prior to the release time, the encryption key remains undiscovered and unavailable in a secure distributed system, making the private data unavailable. In this paper, we propose Emerge, a self-emerging timed data release protocol for securely hiding data encryption keys of private encrypted data in a large-scale Distributed Hash Table (DHT) network that makes the data available and accessible only at the defined release time. We develop a suite of erasure-coding-based routing path construction schemes for securely storing and routing encryption keys in DHT networks that protect an adversary from inferring the encryption key prior to the release time (release-ahead attack) or from destroying the key altogether (drop attack). Through extensive experimental evaluation, we demonstrate that the proposed schemes are resilient to both release-ahead attack and drop attack as well as to attacks that arise due to traditional churn issues in DHT networks
The ViP2P Platform: XML Views in P2P
The growing volumes of XML data sources on the Web or produced by
enterprises, organizations etc. raise many performance challenges for data
management applications. In this work, we are concerned with the distributed,
peer-to-peer management of large corpora of XML documents, based on distributed
hash table (or DHT, in short) overlay networks. We present ViP2P (standing for
Views in Peer-to-Peer), a distributed platform for sharing XML documents based
on a structured P2P network infrastructure (DHT). At the core of ViP2P stand
distributed materialized XML views, defined by arbitrary XML queries, filled in
with data published anywhere in the network, and exploited to efficiently
answer queries issued by any network peer. ViP2P allows user queries to be
evaluated over XML documents published by peers in two modes. First, a
long-running subscription mode, when a query can be registered in the system
and receive answers incrementally when and if published data matches the query.
Second, queries can also be asked in an ad-hoc, snapshot mode, where results
are required immediately and must be computed based on the results of other
long-running, subscription queries. ViP2P innovates over other similar
DHT-based XML sharing platforms by using a very expressive structured XML query
language. This expressivity leads to a very flexible distribution of XML
content in the ViP2P network, and to efficient snapshot query execution. ViP2P
has been tested in real deployments of hundreds of computers. We present the
platform architecture, its internal algorithms, and demonstrate its efficiency
and scalability through a set of experiments. Our experimental results outgrow
by orders of magnitude similar competitor systems in terms of data volumes,
network size and data dissemination throughput.Comment: RR-7812 (2011
Exploiting semantic locality to improve peer-to-peer search mechanisms
A Peer-to-Peer(P2P) network is the most popular technology in file sharing today. With the advent of various commercial and non-commercial applications like KaZaA, Gnutella, a P2P network has exercised its growth and popularity to the maximum. Every node (peer) in a P2P network acts as both a client and a server for other peers. A search in P2P network is performed as a query relayed between peers until the peer that contains the searched data is found. Huge data size, complex management requirements, dynamic network conditions and distributed systems are some of the difficult challenges a P2P system faces while performing a search. Moreover, a blind and uninformed search leads to performance degradation and wastage of resources. To address these weaknesses, techniques like Distributed Hash Table (DHT) has been proposed to place a tight constraint on the node placement. However, it does not considers semantic significance of the data. We propose a new peer to peer search protocol that identities locality in a P2P network to mitigate the complexity in data searching. Locality is a logical semantic categorization of a group of peers sharing common data. With the help of locality information, our search model offers more informed and intelligent search for different queries. To evaluate the effectiveness of our model we propose a new P2P search protocol - LocalChord. LocalChord relies on Chord and demonstrates potential of our proposed locality scheme by re-modelling Chord as a Chord of sub-chords
- …