20,074 research outputs found

    Data sharing in DHT based P2P systems

    Get PDF
    International audienceThe evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the "extreme" characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases forproviding data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identies important future research trends in data management in P2P DHT systems

    Survey of data replication in P2P systems

    Get PDF
    Large-scale distributed collaborative applications are getting common as a result of rapid progress in distributed technologies (grid, peer-to-peer, and mobile computing). Peer-to-peer (P2P) systems are particularly interesting for collaborative applications as they can scale without the need for powerful servers. In P2P systems, data storage and processing are distributed across autonomous peers, which can join and leave the network at any time. To provide high data availability in spite of such dynamic behavior, P2P systems rely on data replication. Some replication approaches assume static, read-only data (e.g. music files). Other solutions deal with updates, but they simplify replica management by assuming no update conflicts or single-master replication (i.e. only one copy of the replicated data accepts write operations). P2P advanced applications, which must deal with semantically rich data (e.g. XML documents, relational tables, etc.) using a high-level SQL-like query language, are likely to need more sophisticated capabilities such as multi-master replication (i.e. all replicas accept write operations) and update conflict resolution. These issues are addressed by optimistic replication. Optimistic replication allows asynchronous updating of replicas so that applications can progress even though some nodes are disconnected or have failed. As a result, users can collaborate asynchronously. However, concurrent updates may cause replica divergence and conflicts, which should be reconciled. In this survey, we present an overview of data replication, focusing on the optimistic approach that provides good properties for dynamic environments. We also introduce P2P systems and the replication solutions they implement. In particular, we show that current P2P systems do not provide eventual consistency among replicas in the presence of updates, apart from APPA system, a P2P data management system that we are building

    Ontology-based Search Algorithms over Large-Scale Unstructured Peer-to-Peer Networks

    Get PDF
    Peer-to-Peer(P2P) systems have emerged as a promising paradigm to structure large scale distributed systems. They provide a robust, scalable and decentralized way to share and publish data.The unstructured P2P systems have gained much popularity in recent years for their wide applicability and simplicity. However efficient resource discovery remains a fundamental challenge for unstructured P2P networks due to the lack of a network structure. To effectively harness the power of unstructured P2P systems, the challenges in distributed knowledge management and information search need to be overcome. Current attempts to solve the problems pertaining to knowledge management and search have focused on simple term based routing indices and keyword search queries. Many P2P resource discovery applications will require more complex query functionality, as users will publish semantically rich data and need efficiently content location algorithms that find target content at moderate cost. Therefore, effective knowledge and data management techniques and search tools for information retrieval are imperative and lasting. In my dissertation, I present a suite of protocols that assist in efficient content location and knowledge management in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past peer interactions and increasing their performance with time.My work aims to provide effective and bandwidth-efficient searching and data sharing in unstructured P2P environments. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. Also, Existing approaches to federated search are adapted and new methods are developed for semantic knowledge representation, resource selection, and knowledge evolution for efficient search in dynamic and distributed P2P network environments. Furthermore,autonomous and decentralized algorithms that reorganizes an unstructured network topology into a one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient semantic search in dynamic environments

    A Peer-To-Peer Data Processing Infrastructure That Operates On The Largest Scale

    Get PDF
    In the corporate network, information may be exchanged throughout different businesses, making it simpler for those with common interests to collaborate. With its help, firms may be able to reduce overhead costs and increase revenue. As data is exchanged and processed across firms, it increases the complexity of implementing a scalable, high-performing, and secure data management system. This article presents BP++, an expansion of the BestPeer P2P data management platform that offers elastic data sharing services for cloud-based business network applications. BP++ integrates cloud computing, databases, and P2P technologies to provide its members with data sharing services under the well-known pay-as-you-go pricing model. For our BP++ tests, we make use of Amazon's EC2 cloud infrastructure. Benchmarks show that BP++ outperforms the recently proposed HadoopDB large-scale data processing solution when both systems are employed to handle typical business network demands. The results demonstrate that BP++ is quite efficient, with throughput scaling practically linearly with the number of peer nodes

    Query processing in P2P systems

    Get PDF
    Peer-to-peer (P2P) computing offers new opportunities for building highly distributed data systems. Unlike client-server computing, P2P is a very dynamic environment where peers can join and leave the network at any time. This yields important advantages such as operation without central coordination, peers autonomy, and scale up to large number of peers. However, providing high-level data management services is difficult. Most techniques designed in distributed database systems which statically exploit schema and network information no longer apply. New techniques are needed which should be decentralized, dynamic and self-adaptive. In this paper, we survey the techniques which have been developed for query processing in P2P systems. We first give an overview of the existing P2P networks, and com-pare their properties from the perspective of data management. Then, we discuss the ap-proaches which are used for schema mapping. Then, we describe the algorithms which have been proposed for query routing. In particular, we focus on query routing in unstructured net-works and DHTs. Finally, we present the techniques which have been proposed for processing complex queries, e.g. top-k queries, in P2P systems, in particular in DHTs

    Survey of data replication in P2P systems

    Get PDF
    Large-scale distributed collaborative applications are getting common as a result of rapid progress in distributed technologies (grid, peer-to-peer, and mobile computing). Peer-to-peer (P2P) systems are particularly interesting for collaborative applications as they can scale without the need for powerful servers. In P2P systems, data storage and processing are distributed across autonomous peers, which can join and leave the network at any time. To provide high data availability in spite of such dynamic behavior, P2P systems rely on data replication. Some replication approaches assume static, read-only data (e.g. music files). Other solutions deal with updates, but they simplify replica management by assuming no update conflicts or single-master replication (i.e. only one copy of the replicated data accepts write operations). P2P advanced applications, which must deal with semantically rich data (e.g. XML documents, relational tables, etc.) using a high-level SQL-like query language, are likely to need more sophisticated capabilities such as multi-master replication (i.e. all replicas accept write operations) and update conflict resolution. These issues are addressed by optimistic replication. Optimistic replication allows asynchronous updating of replicas so that applications can progress even though some nodes are disconnected or have failed. As a result, users can collaborate asynchronously. However, concurrent updates may cause replica divergence and conflicts, which should be reconciled. In this survey, we present an overview of data replication, focusing on the optimistic approach that provides good properties for dynamic environments. We also introduce P2P systems and the replication solutions they implement. In particular, we show that current P2P systems do not provide eventual consistency among replicas in the presence of updates, apart from APPA system, a P2P data management system that we are building

    Query management in a sensor environment

    Get PDF
    Traditional sensor network deployments consisted of fixed infrastructures and were relatively small in size. More and more, we see the deployment of ad-hoc sensor networks with heterogeneous devices on a larger scale, posing new challenges for device management and query processing. In this paper, we present our design and prototype implementation of XSense, an architecture supporting metadata and query services for an underlying large scale dynamic P2P sensor network. We cluster sensor devices into manageable groupings to optimise the query process and automatically locate appropriate clusters based on keyword abstraction from queries. We present experimental analysis to show the benefits of our approach and demonstrate improved query performance and scalability

    The state of peer-to-peer network simulators

    Get PDF
    Networking research often relies on simulation in order to test and evaluate new ideas. An important requirement of this process is that results must be reproducible so that other researchers can replicate, validate and extend existing work. We look at the landscape of simulators for research in peer-to-peer (P2P) networks by conducting a survey of a combined total of over 280 papers from before and after 2007 (the year of the last survey in this area), and comment on the large quantity of research using bespoke, closed-source simulators. We propose a set of criteria that P2P simulators should meet, and poll the P2P research community for their agreement. We aim to drive the community towards performing their experiments on simulators that allow for others to validate their results
    corecore