223 research outputs found

    Ontology-based Search Algorithms over Large-Scale Unstructured Peer-to-Peer Networks

    Get PDF
    Peer-to-Peer(P2P) systems have emerged as a promising paradigm to structure large scale distributed systems. They provide a robust, scalable and decentralized way to share and publish data.The unstructured P2P systems have gained much popularity in recent years for their wide applicability and simplicity. However efficient resource discovery remains a fundamental challenge for unstructured P2P networks due to the lack of a network structure. To effectively harness the power of unstructured P2P systems, the challenges in distributed knowledge management and information search need to be overcome. Current attempts to solve the problems pertaining to knowledge management and search have focused on simple term based routing indices and keyword search queries. Many P2P resource discovery applications will require more complex query functionality, as users will publish semantically rich data and need efficiently content location algorithms that find target content at moderate cost. Therefore, effective knowledge and data management techniques and search tools for information retrieval are imperative and lasting. In my dissertation, I present a suite of protocols that assist in efficient content location and knowledge management in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past peer interactions and increasing their performance with time.My work aims to provide effective and bandwidth-efficient searching and data sharing in unstructured P2P environments. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. Also, Existing approaches to federated search are adapted and new methods are developed for semantic knowledge representation, resource selection, and knowledge evolution for efficient search in dynamic and distributed P2P network environments. Furthermore,autonomous and decentralized algorithms that reorganizes an unstructured network topology into a one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient semantic search in dynamic environments

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    Data replication and update propagation in XML P2P data management systems

    Get PDF
    XML P2P data management systems are P2P systems that use XML as the underlying data format shared between peers in the network. These systems aim to bring the benefits of XML and P2P systems to the distributed data management field. However, P2P systems are known for their lack of central control and high degree of autonomy. Peers may leave the network at any time at will, increasing the risk of data loss. Despite this, most research in XML P2P systems focus on novel and efficient XML indexing and retrieval techniques. Mechanisms for ensuring data availability in XML P2P systems has received comparatively little attention. This project attempts to address this issue. We design an XML P2P data management framework to improve data availability. This framework includes mechanisms for wide-spread data replication, replica location and update propagation. It allows XML documents to be broken down into fragments. By doing so, we aim to reduce the cost of replicating data by distributing smaller XML fragments throughout the network rather than entire documents. To tackle the data replication problem, we propose a suite of selection and placement algorithms that may be interchanged to form a particular replication strategy. To support the placement of replicas anywhere in the network, we use a Fragment Location Catalogue, a global index that maintains the locations of replicas. We also propose a lazy update propagation algorithm to propagate updates to replicas. Experiments show that the data replication algorithms improve data availability in our experimental network environment. We also find that breaking XML documents into smaller pieces and replicating those instead of whole XML documents considerably reduces the replication cost, but at the price of some loss in data availability. For the update propagation tests, we find that the probability that queries return up-to-date results increases, but improvements to the algorithm are necessary to handle environments with high update rates

    Data sharing in DHT based P2P systems

    Get PDF
    International audienceThe evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the "extreme" characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases forproviding data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identies important future research trends in data management in P2P DHT systems

    Bloom's Filters : Their Types and Analysis

    Get PDF
    Bloom filtrelerini ve çeşitlerini inceleyen bir çalışmanın özetidir. Bloom filtresi sorgulama üyeliklerini desteklemek amacıyla setleri temsil eden rasgele bir veri yapısıdır. 1970’lerde daha çok veri tabanı optimizasyonlarında kullanılmıştır. Bu yakınlarda bilgisayar ağları ile ilgili çalışma yapanlar daha sık kullanmaya başlamıştır. Bu çalışmada filtrelerin çeşitleri analiz edilecektir.In this paper we discuss Bloom filter in its original form and the varieties of its extensions. A Bloom filter is a randomized data-structure for concisely representing a set in order to support approximate membership queries. Although it was devised in 1970 for the purpose of spell checking, it was seldom used except in database optimization. In recent years, it has been rediscovered by the networking community, and has become a key component in many networking systems applications. In this paper, we will examine and analyse the different types of this filter

    Bloom's Filters : Their Types and Analysis

    Get PDF
    Bloom filtrelerini ve çeşitlerini inceleyen bir çalışmanın özetidir. Bloom filtresi sorgulama üyeliklerini desteklemek amacıyla setleri temsil eden rasgele bir veri yapısıdır. 1970’lerde daha çok veri tabanı optimizasyonlarında kullanılmıştır. Bu yakınlarda bilgisayar ağları ile ilgili çalışma yapanlar daha sık kullanmaya başlamıştır. Bu çalışmada filtrelerin çeşitleri analiz edilecektir.In this paper we discuss Bloom filter in its original form and the varieties of its extensions. A Bloom filter is a randomized data-structure for concisely representing a set in order to support approximate membership queries. Although it was devised in 1970 for the purpose of spell checking, it was seldom used except in database optimization. In recent years, it has been rediscovered by the networking community, and has become a key component in many networking systems applications. In this paper, we will examine and analyse the different types of this filter

    Efficient and Flexible Search in Large Scale Distributed Systems

    Get PDF
    Peer-to-peer (P2P) technology has triggered a wide range of distributed systems beyond simple file-sharing. Distributed XML databases, distributed computing, server-less web publishing and networked resource/service sharing are only a few to name. Despite of the diversity in applications, these systems share a common problem regarding searching and discovery of information. This commonality stems from the transitory nodes population and volatile information content in the participating nodes. In such dynamic environment, users are not expected to have the exact information about the available objects in the system. Rather queries are based on partial information, which requires the search mechanism to be flexible. On the other hand, to scale with network size the search mechanism is required to be bandwidth efficient. Since the advent of P2P technology experts from industry and academia have proposed a number of search techniques - none of which is able to provide satisfactory solution to the conflicting requirements of search efficiency and flexibility. Structured search techniques, mostly Distributed Hash Table (DHT)-based, are bandwidth efficient while semi(un)-structured techniques are flexible. But, neither achieves both ends. This thesis defines the Distributed Pattern Matching (DPM) problem. The DPM problem is to discover a pattern (\ie bit-vector) using any subset of its 1-bits, under the assumption that the patterns are distributed across a large population of networked nodes. Search problem in many distributed systems can be reduced to the DPM problem. This thesis also presents two distinct search mechanisms, named Distributed Pattern Matching System (DPMS) and Plexus, for solving the DPM problem. DPMS is a semi-structured, hierarchical architecture aiming to discover a predefined number of matches by visiting a small number of nodes. Plexus, on the other hand, is a structured search mechanism based on the theory of Error Correcting Code (ECC). The design goal behind Plexus is to discover all the matches by visiting a reasonable number of nodes

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Efficient Range and Join Query Processing in Massively Distributed Peer-to-Peer Networks

    Get PDF
    Peer-to-peer (P2P) has become a modern distributed computing architecture that supports massively large-scale data management and query processing. Complex query operators such as range operator and join operator are needed by various distributed applications, including content distribution, locality-aware services, computing resource sharing, and many others. This dissertation tackles a number of problems related to range and join query processing in P2P systems: fault-tolerant range query processing under structured P2P architecture, distributed range caching under unstructured P2P architecture, and integration of heterogeneous data under unstructured P2P architecture. To support fault-tolerant range query processing so as to provide strong performance guarantees in the presence of network churn, effective replication schemes are developed at either the overlay network level or the query processing level. To facilitate range query processing, a prefetch-based caching approach is proposed to eliminate the performance bottlenecks incurred by those data items that are not well cached in the network. Finally, a purely decentralized partition-based join query operator is devised to realize bandwidth-efficient join query processing under unstructured P2P architecture. Theoretical analysis and experimental simulations demonstrate the effectiveness of the proposed approaches
    corecore