8 research outputs found

    An Efficient Architecture for Information Retrieval in P2P Context Using Hypergraph

    Full text link
    Peer-to-peer (P2P) Data-sharing systems now generate a significant portion of Internet traffic. P2P systems have emerged as an accepted way to share enormous volumes of data. Needs for widely distributed information systems supporting virtual organizations have given rise to a new category of P2P systems called schema-based. In such systems each peer is a database management system in itself, ex-posing its own schema. In such a setting, the main objective is the efficient search across peer databases by processing each incoming query without overly consuming bandwidth. The usability of these systems depends on successful techniques to find and retrieve data; however, efficient and effective routing of content-based queries is an emerging problem in P2P networks. This work was attended as an attempt to motivate the use of mining algorithms in the P2P context may improve the significantly the efficiency of such methods. Our proposed method based respectively on combination of clustering with hypergraphs. We use ECCLAT to build approximate clustering and discovering meaningful clusters with slight overlapping. We use an algorithm MTMINER to extract all minimal transversals of a hypergraph (clusters) for query routing. The set of clusters improves the robustness in queries routing mechanism and scalability in P2P Network. We compare the performance of our method with the baseline one considering the queries routing problem. Our experimental results prove that our proposed methods generate impressive levels of performance and scalability with with respect to important criteria such as response time, precision and recall.Comment: 2o pages, 8 figure

    Méthode de découverte de sources de données tenant compte de la sémantique en environnement de grille de données

    Get PDF
    Les applications grilles de données de nos jours partagent un nombre gigantesque de sources de données en un environnement instable où une source de données peut à tout moment joindre ou quitter le système. Ces sources de données sont hétérogènes, autonomes et distribuées à grande échelle. Dans cet environnement, la découverte efficace des sources de données pertinentes pour l'exécution de requêtes est un défi. Les premiers travaux sur la découverte de sources de données se sont basés sur une recherche par mots clés. Ces solutions ne sont pas satisfaisantes puisqu'elles ne tiennent pas compte des problèmes de l'hétérogénéité sémantique des sources de données. Ainsi, d'autres solutions proposent un schéma global ou une ontologie globale. Cependant, la conception d'un tel schéma ou d'une telle ontologie est une tâche complexe à cause du nombre de sources de données. D'autres solutions optent pour l'usage de correspondances entre les schémas des sources de données ou en s'appuyant sur des ontologies de domaine et en établissant des relations de 'mapping' entre ces dernières. Toutes ces solutions imposent une topologie fixe soit pour les correspondances soit pour les relations de 'mapping'. Cependant, la définition de relations de 'mapping' entre ontologies de domaine est une tâche ardue et imposer une topologie fixe est un inconvénient majeur. Dans cette perspective, nous proposons dans cette thèse une méthode de découverte de sources de données prenant en compte les problèmes liés à l'hétérogénéité sémantique en environnement instable et à grande échelle. Pour cela, nous associons une Organisation Virtuelle (OV) et une ontologie de domaine à chaque domaine et nous nous basons sur les relations de 'mappings' existantes entre ces ontologies. Nous n'imposons aucune hypothèse sur la topologie des relations de 'mapping' mis à part que le graphe qu'elles forment soit connexe. Nous définissons un système d'adressage permettant un accès permanent de n'importe quelle OV vers une autre malgré la dynamicité des pairs. Nous présentons également une méthode de maintenance dite 'paresseuse' afin de limiter le nombre de messages nécessaires à la maintenance du système d'adressage lors de la connexion ou de la déconnexion de pairs. Pour étudier la faisabilité ainsi que la viabilité de nos propositions, nous effectuons une évaluation des performances.Nowadays, data grid applications look to share a huge number of data sources in an unstable environment where a data source may join or leave the system at any time. These data sources are highly heterogeneous because they are independently developed and managed and geographically scattered. In this environment, efficient discovery of relevant data sources for query execution is a complex problem due to the source heterogeneity, large scale environment and system instability. First works on data source discovery are based on a keyword search. These initial solutions are not sufficient because they do not take into account problem of semantic heterogeneity of data sources. Thus, the community has proposed other solutions to consider semantic aspects. A first solution consists in using a global schema or global ontology. However, the conception of such scheme or such ontology is a complex task due to the number of data sources. Other solutions have been proposed providing mappings between data source schemas or based on domain ontologies and establishing mapping relations between them. All these solutions impose a fixed topology for connections as well as mapping relationships. However, the definition of mapping relations between domain ontologies is a difficult task and imposing a fixed topology is a major inconvenience. In this perspective, we propose in this thesis a method for discovering data sources taking into account semantic heterogeneity problems in unstable and large scale environment. For that purpose, we associate a Virtual Organisation (VO) and a domain ontology to each domain and we rely on relationship mappings between existing ontologies. We do not impose any hypothesis on the relationship mapping topology, except that they form connected graph. We define an addressing system for permanent access from any OVi to another OVj despite peers' dynamicity (with i inégalité j). We also present a method of maintenance called 'lazy' to limit the number of messages required to maintain the addressing system during the connection or disconnection of peers. To study the feasibility as well as the viability of our proposals, we make a performance evaluation

    Semantic Query Routing in SenPeer, a P2P Data Management System

    No full text
    International audienceA challenging problem in a schema-based peer-to-peer (P2P) system is how to locate peers that are relevant with respect to a given query. In this paper, we propose a new semantic routing mechanism in the context of the SenPeer P2P Data Management System (PDMS). SenPeer is an unstructured P2P system based on an organization of peers around super-peers according to their semantic domains. Our proposal is based on the use of a distributed data structure, called expertise table, maintained by the super-peers and describing data at the neighboring peers. This table, combined with matching techniques, is the basis of a semantic overlay network. Semantic links are exploited for efficient query propagation towards peers that may have relevant data. We give a performance evaluation of our semantic query routing with respect to important criteria such as precision, recall and number of messages. The results show that our algorithm significantly outperforms a baseline algorithm without semantics

    Semantic Query Routing in SenPeer, a P2P Data Management System

    No full text
    International audienceA challenging problem in a schema-based peer-to-peer (P2P) system is how to locate peers that are relevant with respect to a given query. In this paper, we propose a new semantic routing mechanism in the context of the SenPeer P2P Data Management System (PDMS). SenPeer is an unstructured P2P system based on an organization of peers around super-peers according to their semantic domains. Our proposal is based on the use of a distributed data structure, called expertise table, maintained by the super-peers and describing data at the neighboring peers. This table, combined with matching techniques, is the basis of a semantic overlay network. Semantic links are exploited for efficient query propagation towards peers that may have relevant data. We give a performance evaluation of our semantic query routing with respect to important criteria such as precision, recall and number of messages. The results show that our algorithm significantly outperforms a baseline algorithm without semantics

    Localisation de sources de données et optimisation de requêtes réparties en environnement pair-à-pair

    Get PDF
    Malgré leur succès dans le domaine du partage de fichiers, les systèmes P2P sont capables d'évaluer uniquement des requêtes simples basées sur la recherche d'un fichier en utilisant son nom. Récemment, plusieurs travaux de recherche sont effectués afin d'étendre ces systèmes pour qu'ils permettent le partage de données avec une granularité fine (i.e. un attribut atomique) et l'évaluation de requêtes complexes (i.e. requêtes SQL). A cause des caractéristiques des systèmes P2P (e.g. grande-échelle, instabilité et autonomie de nœuds), il n'est pas pratique d'avoir un catalogue global qui contient souvent des informations sur: les schémas, les données et les hôtes des sources de données. L'absence d'un catalogue global rend plus difficiles: (i) la localisation de sources de données en prenant en compte l'hétérogénéité de schémas et (ii) l'optimisation de requêtes. Dans notre thèse, nous proposons une approche pour l'évaluation des requêtes SQL en environnement P2P. Notre approche est fondée sur une ontologie de domaine et sur des formules de similarité pour résoudre l'hétérogénéité sémantique des schémas locaux. Quant à l'hétérogénéité structurelle de ces schémas, elle est résolue grâce à l'extension d'un algorithme de routage de requêtes (i.e. le protocole Chord) par des Indexes de structure. Concernant l'optimisation de requêtes, nous proposons de profiter de la phase de localisation de sources de données pour obtenir toutes les méta-données nécessaires pour générer un plan d'exécution proche de l'optimal. Afin de montrer la faisabilité et la validité de nos propositions, nous effectuons une évaluation des performances et nous discutons les résultats obtenus.Despite of their great success in the file sharing domain, P2P systems support only simple queries usually based on looking up a file by using its name. Recently, several research works have made to extend P2P systems to be able to share data having a fine granularity (i.e. atomic attribute) and to process queries written with a highly expressive language (i.e. SQL). The characteristics of P2P systems (e.g. large-scale, node autonomy and instability) make impractical to have a global catalog that stores often information about data, schemas and data source hosts. Because of the absence of a global catalog, two problems become more difficult: (i) locating data sources with taking into account the schema heterogeneity and (ii) query optimization. In our thesis, we propose an approach for processing SQL queries in a P2P environment. To solve the semantic heterogeneity between local schemas, our approach is based on domain ontology and on similarity formulas. As for the structural heterogeneity of local schemas, it is solved by the extension of a query routing method (i.e. Chord protocol) with Structure Indexes. Concerning the query optimization problem, we propose to take advantage of the data source localization phase to obtain all metadata required for generating a close to optimal execution plan. Finally, in order to show the feasibility and the validity of our propositions, we carry out performance evaluations and we discuss the obtained results

    Semantic Routed Network for Distributed Search Engines

    Get PDF
    Searching for textual information has become an important activity on the web. To satisfy the rising demand and user expectations, search systems should be fast, scalable and deliver relevant results. To decide which objects should be retrieved, search systems should compare holistic meanings of queries and text document objects, as perceived by humans. Existing techniques do not enable correct comparison of composite holistic meanings like: "evidences on role of DR2 gene in development of diabetes in Caucasian population", which is composed of multiple elementary meanings: "evidence", "DR2 gene", etc. Thus these techniques can not discern objects that have a common set of keywords but convey different meanings. Hence we need new methods to compare composite meanings for superior search quality. In distributed search engines, for scalability, speed and efficiency, index entries should be systematically distributed across multiple index-server nodes based on the meaning of the objects. Furthermore, queries should be selectively sent to those index nodes which have relevant entries. This requires an overlay Semantic Routed Network which will route messages, based on meaning. This network will consist of fast response networking appliances called semantic routers. These appliances need to: (a) carry out sophisticated meaning comparison computations at high speed; and (b) have the right kind of behavior to automatically organize an optimal index system. This dissertation presents the following artifacts that enable the above requirements: (1) An algebraic theory, a design of a data structure and related techniques to efficiently compare composite meanings. (2) Algorithms and accelerator architectures for high speed meaning comparisons inside semantic routers and index-server nodes. (3) An overlay network to deliver search queries to the index nodes based on meanings. (4) Algorithms to construct a self-organizing, distributed meaning based index system. The proposed techniques can compare composite meanings ~105 times faster than an equivalent software code and existing hardware designs. Whereas, the proposed index organization approach can lead to 33% savings in number of servers and power consumption in a model search engine having 700,000 servers. Therefore, using all these techniques, it is possible to design a Semantic Routed Network which has a potential to improve search results and response time, while saving resources
    corecore