9 research outputs found

    Reducing Network Traffic in Unstructured P2P Systems Using Top-k Queries

    Get PDF
    A major problem of unstructured P2P systems is their heavy network traffic. This is caused mainly by high numbers of query answers, many of which are irrelevant for users. One solution to this problem is to use Top-k queries whereby the user can specify a limited number (k) of the most relevant answers. In this paper, we present FD, a (Fully Distributed) framework for executing Top-k queries in unstructured P2P systems, with the objective of reducing network traffic. FD consists of a family of algorithms that are simple but effec-tive. FD is completely distributed, does not depend on the existence of certain peers, and addresses the volatility of peers during query execution. We vali-dated FD through implementation over a 64-node cluster and simulation using the BRITE topology generator and SimJava. Our performance evaluation shows that FD can achieve major performance gains in terms of communication and response time

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    An Efficient Architecture for Information Retrieval in P2P Context Using Hypergraph

    Full text link
    Peer-to-peer (P2P) Data-sharing systems now generate a significant portion of Internet traffic. P2P systems have emerged as an accepted way to share enormous volumes of data. Needs for widely distributed information systems supporting virtual organizations have given rise to a new category of P2P systems called schema-based. In such systems each peer is a database management system in itself, ex-posing its own schema. In such a setting, the main objective is the efficient search across peer databases by processing each incoming query without overly consuming bandwidth. The usability of these systems depends on successful techniques to find and retrieve data; however, efficient and effective routing of content-based queries is an emerging problem in P2P networks. This work was attended as an attempt to motivate the use of mining algorithms in the P2P context may improve the significantly the efficiency of such methods. Our proposed method based respectively on combination of clustering with hypergraphs. We use ECCLAT to build approximate clustering and discovering meaningful clusters with slight overlapping. We use an algorithm MTMINER to extract all minimal transversals of a hypergraph (clusters) for query routing. The set of clusters improves the robustness in queries routing mechanism and scalability in P2P Network. We compare the performance of our method with the baseline one considering the queries routing problem. Our experimental results prove that our proposed methods generate impressive levels of performance and scalability with with respect to important criteria such as response time, precision and recall.Comment: 2o pages, 8 figure

    Data Management in the APPA System

    Get PDF
    International audienceCombining Grid and P2P technologies can be exploited to provide high-level data sharing in large-scale distributed environments. However, this combination must deal with two hard problems: the scale of the network and the dynamic behavior of the nodes. In this paper, we present our solution in APPA (Atlas Peer-to-Peer Architecture), a data management system with high-level services for building large-scale distributed applications. We focus on data availability and data discovery which are two main requirements for implementing large-scale Grids. We have validated APPA's services through a combination of experimentation over Grid5000, which is a very large Grid experimental platform, and simulation using SimJava. The results show very good performance in terms of communication cost and response time

    Grid Data Management: Open Problems and New Issues

    Get PDF
    International audienceInitially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques

    Efficient Multidimensional Top- k

    Get PDF

    Framework for privacy-aware content distribution in peer-to- peer networks with copyright protection

    Get PDF
    The use of peer-to-peer (P2P) networks for multimedia distribution has spread out globally in recent years. This mass popularity is primarily driven by the efficient distribution of content, also giving rise to piracy and copyright infringement as well as privacy concerns. An end user (buyer) of a P2P content distribution system does not want to reveal his/her identity during a transaction with a content owner (merchant), whereas the merchant does not want the buyer to further redistribute the content illegally. Therefore, there is a strong need for content distribution mechanisms over P2P networks that do not pose security and privacy threats to copyright holders and end users, respectively. However, the current systems being developed to provide copyright and privacy protection to merchants and end users employ cryptographic mechanisms, which incur high computational and communication costs, making these systems impractical for the distribution of big files, such as music albums or movies.El uso de soluciones de igual a igual (peer-to-peer, P2P) para la distribución multimedia se ha extendido mundialmente en los últimos años. La amplia popularidad de este paradigma se debe, principalmente, a la distribución eficiente de los contenidos, pero también da lugar a la piratería, a la violación del copyright y a problemas de privacidad. Un usuario final (comprador) de un sistema de distribución de contenidos P2P no quiere revelar su identidad durante una transacción con un propietario de contenidos (comerciante), mientras que el comerciante no quiere que el comprador pueda redistribuir ilegalmente el contenido más adelante. Por lo tanto, existe una fuerte necesidad de mecanismos de distribución de contenidos por medio de redes P2P que no supongan un riesgo de seguridad y privacidad a los titulares de derechos y los usuarios finales, respectivamente. Sin embargo, los sistemas actuales que se desarrollan con el propósito de proteger el copyright y la privacidad de los comerciantes y los usuarios finales emplean mecanismos de cifrado que implican unas cargas computacionales y de comunicaciones muy elevadas que convierten a estos sistemas en poco prácticos para distribuir archivos de gran tamaño, tales como álbumes de música o películas.L'ús de solucions d'igual a igual (peer-to-peer, P2P) per a la distribució multimèdia s'ha estès mundialment els darrers anys. L'àmplia popularitat d'aquest paradigma es deu, principalment, a la distribució eficient dels continguts, però també dóna lloc a la pirateria, a la violació del copyright i a problemes de privadesa. Un usuari final (comprador) d'un sistema de distribució de continguts P2P no vol revelar la seva identitat durant una transacció amb un propietari de continguts (comerciant), mentre que el comerciant no vol que el comprador pugui redistribuir il·legalment el contingut més endavant. Per tant, hi ha una gran necessitat de mecanismes de distribució de continguts per mitjà de xarxes P2P que no comportin un risc de seguretat i privadesa als titulars de drets i els usuaris finals, respectivament. Tanmateix, els sistemes actuals que es desenvolupen amb el propòsit de protegir el copyright i la privadesa dels comerciants i els usuaris finals fan servir mecanismes d'encriptació que impliquen unes càrregues computacionals i de comunicacions molt elevades que fan aquests sistemes poc pràctics per a distribuir arxius de grans dimensions, com ara àlbums de música o pel·lícules
    corecore