28 research outputs found
GridVine: an Infrastructure for Peer Information Management
GridVine is a semantic overlay infrastructure based on a peer-to-peer (P2P) access structure. Built following the principle of data independence, it separates a logical layer — in which data, schemas, and schema mappings are managed — from a physical layer consisting of a structured P2P network supporting decentralized indexing, key load-balancing, and efficient routing. The system is decentralized, yet fosters semantic interoperability through pair-wise schema mappings and query reformulation. GridVine’s heterogeneous but semantically related information sources can be queried transparently using iterative query reformulation. The authors discuss a reference implementation of the system and several mechanisms for resolving queries collaboratively
PicShark: mitigating metadata scarcity through large-scale P2P collaboration
With the commoditization of digital devices, personal information and media sharing is becoming a key application on the pervasive Web. In such a context, data annotation rather than data production is the main bottleneck. Metadata scarcity represents a major obstacle preventing efficient information processing in large and heterogeneous communities. However, social communities also open the door to new possibilities for addressing local metadata scarcity by taking advantage of global collections of resources. We propose to tackle the lack of metadata in large-scale distributed systems through a collaborative process leveraging on both content and metadata. We develop a community-based and self-organizing system called PicShark in which information entropy—in terms of missing metadata—is gradually alleviated through decentralized instance and schema matching. Our approach focuses on semi-structured metadata and confines computationally expensive operations to the edge of the network, while keeping distributed operations as simple as possible to ensure scalability. PicShark builds on structured Peer-to-Peer networks for distributed look-up operations, but extends the application of self-organization principles to the propagation of metadata and the creation of schema mappings. We demonstrate the practical applicability of our method in an image sharing scenario and provide experimental evidences illustrating the validity of our approac
PicShark: Mitigating Metadata Scarcity Through Large-Scale P2P Collaboration
Abstract With the commoditization of digital devices, personal information and media sharing is becoming a key application on the pervasive Web. In such a context, data annotation rather than data production is the main bottleneck. Metadata scarcity represents a major obstacle preventing effcient information processing in large and heterogeneous communities. However, social communities also open the door to new possibilities for addressing local metadata scarcity by taking advantage of global collections of resources. We propose to tackle the lack of metadata in large-scale distributed systems through a collaborative process leveraging on both content and metadata. We develop a community-based and self-organizing system called PicShark in which information entropy in terms of missing metadata is gradually alleviated through decentralized instance and schema matching. Our approach focuses on semi- structured metadata and confines computationally expensive operations to the edge of the network, while keeping distributed operations as simple as possible to ensure scalability. PicShark builds on structured Peer-to-Peer networks for distributed look-up operations, but extends the application of self-organization principles to the propagation of metadata and the creation of schema mappings. We demonstrate the practical applicability of our method in an image sharing scenario and provide experimental evidences illustrating the validity of our approach
An Efficient Architecture for Information Retrieval in P2P Context Using Hypergraph
Peer-to-peer (P2P) Data-sharing systems now generate a significant portion of
Internet traffic. P2P systems have emerged as an accepted way to share enormous
volumes of data. Needs for widely distributed information systems supporting
virtual organizations have given rise to a new category of P2P systems called
schema-based. In such systems each peer is a database management system in
itself, ex-posing its own schema. In such a setting, the main objective is the
efficient search across peer databases by processing each incoming query
without overly consuming bandwidth. The usability of these systems depends on
successful techniques to find and retrieve data; however, efficient and
effective routing of content-based queries is an emerging problem in P2P
networks. This work was attended as an attempt to motivate the use of mining
algorithms in the P2P context may improve the significantly the efficiency of
such methods. Our proposed method based respectively on combination of
clustering with hypergraphs. We use ECCLAT to build approximate clustering and
discovering meaningful clusters with slight overlapping. We use an algorithm
MTMINER to extract all minimal transversals of a hypergraph (clusters) for
query routing. The set of clusters improves the robustness in queries routing
mechanism and scalability in P2P Network. We compare the performance of our
method with the baseline one considering the queries routing problem. Our
experimental results prove that our proposed methods generate impressive levels
of performance and scalability with with respect to important criteria such as
response time, precision and recall.Comment: 2o pages, 8 figure
Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks
We study the problem of evaluating conjunctive queries com-
posed of triple patterns over RDF data stored in distributed hash tables.
Our goal is to develop algorithms that scale to large amounts of RDF
data, distribute the query processing load evenly and incur little network
traffic. We present and evaluate two novel query processing algorithms
with these possibly conflicting goals in mind. We discuss the various
tradeoffs that occur in our setting through a detailed experimental eval-
uation of the proposed algorithms
RDF Data Indexing and Retrieval: A survey of Peer-to-Peer based solutions
The Semantic Web enables the possibility to model, create and query resources found on the Web. Enabling the full potential of its technologies at the Internet level requires infrastructures that can cope with scalability challenges and support various types of queries. The attractive features of the Peer-to-Peer (P2P) communication model such as decentralization, scalability, fault-tolerance seems to be a natural solution to deal with these challenges. Consequently, the combination of the Semantic Web and the P2P model can be a highly innovative attempt to harness the strengths of both technologies and come up with a scalable infrastructure for RDF data storage and retrieval. In this respect, this survey details the research works that adopt this combination and gives an insight on how to deal with the RDF data at the indexing and querying levels.Le Web Sémantique permet de modéliser, créer et faire des requêtes sur les ressources disponibles sur le Web. Afin de permettre à ses technologies d'exploiter leurs potentiels à l'échelle de l'Internet, il est nécessaire qu'elles reposent sur des infrastructures qui puissent passer à l'échelle ainsi que de répondre aux exigences d'expressivité des types de requêtes qu'elles offrent. Les bonnes propriétés qu'offrent les dernières générations de systèmes pair-à - pair en termes de décentralisation, de tolérance aux pannes ainsi que de passage à l'échelle en font d'eux des candidats prometteurs. La combinaison du modèle pair-à -pair et des technologies du Web Sémantique est une tentative innovante ayant pour but de fournir une infrastructure capable de passer à l'échelle et pouvant stocker et rechercher des données de type RDF. Dans ce contexte, ce rapport présente un état de l'art et discute en détail des travaux autour de systèmes pair-à -pair qui traitent des données de type RDF à large échelle. Nous détaillons leurs mécanismes d'indexation de données ainsi que le traitement des divers types de requêtes offerts