Search CORE

2 research outputs found

Un modèle de recherche d'information basé sur les graphes et les similarités structurelles pour l'amélioration du processus de recherche d'information

Author: Champclaux Yaël
Publication venue: HAL CCSD
Publication date: 04/12/2009
Field of study

The main objective of IR systems is to select relevant documents, related to a user's information need, from a collection of documents. Traditional approaches for document/query comparison use surface similarity, i.e. the comparison engine uses surface attributes (indexing terms). We propose a new method which uses a special kind of similarity, namely structural similarities (similarities that use both surface attributes and relation between attributes). These similarities were inspired from cognitive studies and a general similarity measure based on node comparison in a bipartite graph. We propose an adaptation of this general method to the special context of information retrieval. Adaptation consists in taking into account the domain specificities: data type, weighted edges, normalization choice. The core problem is how documents are compared against queries. The idea we develop is that similar documents will share similar terms and similar terms will appear in similar documents. We have developed an algorithm which traduces this idea. Then we have study problem related to convergence and complexity, then we have produce some test on classical collection and compare our measure with two others that are references in our domain. The Report is structured in five chapters: First chapter deals with comparison problem, and related concept like similarities, we explain different point of view and propose an analogy between cognitive similarity model and IR model. In the second chapter we present the IR task, test collection and measures used to evaluate a relevant document list. The third chapter introduces graph definition: our model is based on graph bipartite representation, so we define graphs and criterions used to evaluate them. The fourth chapter describe how we have adopted, and adapted the general comparison method. The Fifth chapter describes how we evaluate the ordering performance of our method, and also how we have compared our method with two others.Cette thèse d'informatique s'inscrit dans le domaine de la recherche d'information (RI). Elle a pour objet la création d'un modèle de recherche utilisant les graphes pour en exploiter la structure pour la détection de similarités entre les documents textuels d'une collection donnée et une requête utilisateur en vue d'améliorer le processus de recherche d'information. Ces similarités sont dites « structurelles » et nous montrons qu'elles apportent un gain d'information bénéfique par rapport aux seules similarités directes. Le rapport de thèse est structuré en cinq chapitres. Le premier chapitre présente un état de l'art sur la comparaison et les notions connexes que sont la distance et la similarité. Le deuxième chapitre présente les concepts clés de la RI, notamment l'indexation des documents, leur comparaison, et l'évaluation des classements retournés. Le troisième chapitre est consacré à la théorie des graphes et introduit les notations et notions liées à la représentation par graphe. Le quatrième chapitre présente pas à pas la construction de notre modèle pour la RI, puis, le cinquième chapitre décrit son application dans différents cas de figure, ainsi que son évaluation sur différentes collections et sa comparaison à d'autres approches

Thèses en Ligne

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Techniques d'analyse dynamique des média sociaux pour la relation client

Author: LE TRAN Duc Kinh
Publication venue: HAL CCSD
Publication date: 05/06/2015
Field of study

This thesis is in the field of data mining and in the context of Customer Relationship Management (CRM). With the emergence of social media, companies today have seen the need for an interchannel (or cross-channel) strategy in which they keep track of their clients' histories through a consistent combination of multiple channels. The goal of this thesis is to develop new data mining methods which allow predicting customer behaviors using data collected from multiple channels such as social media, call center¿ We are interested in all types of customer behaviors that characterized their engagement with respect to the company. First of all, we perform a needs analysis in terms of data mining for interchannel CRM strategy. Next, we propose a new method of prediction of customer behaviors in the context of interchannel CRM. In our method, we use a social attributed network to represent the data from multiple channels and perform incremental learning based on latent factor models. We then carry out experiments on both synthetic and real data. We show that our method based on the latent factor models is capable of leveraging informative latent factors from interchannel data. In future works, we consider some ways to improve the performance of our method, especially latent factor models that are able to leverage different types of relational correlation between individuals in the social graph.Cette thèse d'informatique en fouille de données et apprentissage automatique s'inscrit dans le contexte applicatif de la gestion de la relation client (Customer Relationship Management ou CRM). Avec l'émergence des média sociaux, les entreprises perçoivent actuellement la nécessité d'une stratégie de relation client intercanale dans laquelle elles suivent le parcours du client sur l¿ensemble des canaux d¿interactions tels que les média sociaux, la hot line¿ et cela de manière integrée. L'objectif applicatif de la thèse est de concevoir de nouvelles techniques permettant de prédire les comportements du client à partir des données issues de ces multiples canaux. Nous nous intéressons aux comportements qui caractérisent l'engagement du client vis-à-vis de l'entreprise. Nous effectuons d'abord une analyse des besoins dans laquelle nous montrons la nécessité des nouvelles techniques de fouilles de données pour une stratégie de relation client intégrant plusieurs canaux de nature différente. Nous introduisons ensuite une nouvelle méthode d'apprentissage incrémental basée sur les modèles à facteurs latents et sur la représentation de réseau social attribué. Nous effectuons ensuite des expérimentations sur des données synthétiques et réelles. Nous montrons que notre méthode de réduction de dimension est capable d'extraire des variables latentes informatives pour prédire les comportements des clients à partir de données intercanales. Dans les perspectives, nous proposons quelques pistes d'amélioration de notre méthode, notamment d'autres modèles à facteurs latents permettant d'exploiter différents types de corrélations entre les individus dans le graphe social

Thèses en Ligne

HAL-Université de Bretagne Occidentale