4 research outputs found

    Localisation de sources de donnĂ©es et optimisation de requĂȘtes rĂ©parties en environnement pair-Ă -pair

    Get PDF
    MalgrĂ© leur succĂšs dans le domaine du partage de fichiers, les systĂšmes P2P sont capables d'Ă©valuer uniquement des requĂȘtes simples basĂ©es sur la recherche d'un fichier en utilisant son nom. RĂ©cemment, plusieurs travaux de recherche sont effectuĂ©s afin d'Ă©tendre ces systĂšmes pour qu'ils permettent le partage de donnĂ©es avec une granularitĂ© fine (i.e. un attribut atomique) et l'Ă©valuation de requĂȘtes complexes (i.e. requĂȘtes SQL). A cause des caractĂ©ristiques des systĂšmes P2P (e.g. grande-Ă©chelle, instabilitĂ© et autonomie de nƓuds), il n'est pas pratique d'avoir un catalogue global qui contient souvent des informations sur: les schĂ©mas, les donnĂ©es et les hĂŽtes des sources de donnĂ©es. L'absence d'un catalogue global rend plus difficiles: (i) la localisation de sources de donnĂ©es en prenant en compte l'hĂ©tĂ©rogĂ©nĂ©itĂ© de schĂ©mas et (ii) l'optimisation de requĂȘtes. Dans notre thĂšse, nous proposons une approche pour l'Ă©valuation des requĂȘtes SQL en environnement P2P. Notre approche est fondĂ©e sur une ontologie de domaine et sur des formules de similaritĂ© pour rĂ©soudre l'hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique des schĂ©mas locaux. Quant Ă  l'hĂ©tĂ©rogĂ©nĂ©itĂ© structurelle de ces schĂ©mas, elle est rĂ©solue grĂące Ă  l'extension d'un algorithme de routage de requĂȘtes (i.e. le protocole Chord) par des Indexes de structure. Concernant l'optimisation de requĂȘtes, nous proposons de profiter de la phase de localisation de sources de donnĂ©es pour obtenir toutes les mĂ©ta-donnĂ©es nĂ©cessaires pour gĂ©nĂ©rer un plan d'exĂ©cution proche de l'optimal. Afin de montrer la faisabilitĂ© et la validitĂ© de nos propositions, nous effectuons une Ă©valuation des performances et nous discutons les rĂ©sultats obtenus.Despite of their great success in the file sharing domain, P2P systems support only simple queries usually based on looking up a file by using its name. Recently, several research works have made to extend P2P systems to be able to share data having a fine granularity (i.e. atomic attribute) and to process queries written with a highly expressive language (i.e. SQL). The characteristics of P2P systems (e.g. large-scale, node autonomy and instability) make impractical to have a global catalog that stores often information about data, schemas and data source hosts. Because of the absence of a global catalog, two problems become more difficult: (i) locating data sources with taking into account the schema heterogeneity and (ii) query optimization. In our thesis, we propose an approach for processing SQL queries in a P2P environment. To solve the semantic heterogeneity between local schemas, our approach is based on domain ontology and on similarity formulas. As for the structural heterogeneity of local schemas, it is solved by the extension of a query routing method (i.e. Chord protocol) with Structure Indexes. Concerning the query optimization problem, we propose to take advantage of the data source localization phase to obtain all metadata required for generating a close to optimal execution plan. Finally, in order to show the feasibility and the validity of our propositions, we carry out performance evaluations and we discuss the obtained results

    Knowledge Discovery from XML documents: PAKDD 2006 Workshop Proceedings First International Workshop, KDXD 2006, Singapore, April 9, 2006.

    No full text
    The KDXD'06 (Knowledge Discovery from XML Documents) workshop is\ud the first international workshop running this year in conjunction\ud with the PAKDD'06 conference. The workshop provides an important\ud forum for the dissemination and exchange of new ideas and,\ud research related to XML data discovery and retrieval.\ud \ud The eXtensible Markup Language (XML) has become a standard\ud language for data representation and exchange. With the continuous\ud growth in XML data sources, the ability to manage collections of\ud XML documents and discover knowledge from them for decision\ud support becomes increasingly important. Due to the inherent\ud flexibility of XML, in both structure and semantics, inferring\ud important knowledge from XML data is faced with new challenges as\ud well as benefits. The objective of the workshop is to bring\ud together researchers and practitioners to discuss all aspects of\ud the emerging XML data management challenges. Thus, the topics of\ud interest included, but were not limited to: XML data mining\ud methods; XML data mining applications; XML data management\ud emerging issues and challenges; XML in improving knowledge\ud discovery process; and Benchmarks and mining performance using XML\ud databases.\ud \ud The workshop received 26 submissions. We would like to thank all\ud those who submitted their work to the workshop under relatively\ud pressuring time deadlines. We have selected 10 high quality full\ud papers for the discussion and presentation in the workshop and for\ud inclusion in the proceedings after peer-reviews by at least three\ud members of the Program Committee. Accepted papers have been\ud grouped in three sessions and allocated equal presentation time\ud slots. The first session is on XML data mining methods of\ud classification, clustering and association. The second session\ud focuses on the XML data reasoning and querying methods. Query\ud Optimization. And, the last session is on XML data applications of\ud transportation and security .\ud \ud Special thanks go to the program committee members who shared\ud their expertise and time to make KDXD'06 a success. The final\ud quality of selected papers depends on their efforts.\ud \ud Last but least, we would like to thank the organizers of PAKDD\ud 2006 for hosting KDXD'06
    corecore