372 research outputs found

    Analyse des paramètres de recherche d'information: Etude de l'influence des paramètres sur les résultats

    Get PDF
    International audienceCet article présente une analyse détaillée d'un ensemble de 2 millions de résultats de recherche d'information obtenus par différents paramétrages de systèmes de recherche d'information. Plus spécifiquement, nous avons utilisé la plateforme Terrier et l'interface RunGeneration pour créer différentes exécutions (run en anglais) en modifiant les modèles d'indexation et de recherche. Nous avons ensuite évalué chacun des résultats obtenus selon différentes mesures de performance de recherche d'information. Une analyse systématique a été menée sur ces données afin de déterminer d'une part quels étaient les paramètres qui ont le plus d'influence, d'autre part quels étaient les valeurs de ces paramètres les plus susceptibles de conduire à de bonnes performances du système

    IRIT at INEX 2014 : Tweet Contextualization Track

    Get PDF
    National audienceThe paper presents IRIT's approach used at INEX Tweet Contextualization Track 2014. Systems had to provide a context to a tweet from the perspective of the entity. This year we further modified our approach presented at INEX 2011, 2012 and 2013 underlain by the product of different measures based on smoothing from local context, named entity recognition, part-ofspeech weighting and sentence quality analysis. We introduced two ways to link an entity and a tweet, namely (1) concatenation of the entity and the tweet and (2) usage of the results obtained for the entity as a restriction to filter results retrieved for the tweet. Besides, we examined the influence of topic-comment relationship on contextualization

    Linguistic Analysis of Users' Queries: towards an adaptive Information Retrieval System

    Get PDF
    International audienceMost of Information Retrieval Systems transform natural language users'queries into bags of words that are matched to documents also represented as bags of words. Through such process, the richness of the query is lost. In this paper we show that linguistic features of a query are good indicators to predict systems failure to answer it. The experiments are based on 42 systems or system variants and 50 TREC topics that consist of a descriptive part expressed in natural language

    An energy-based model to optimize cluster visualization

    Get PDF
    National audienceGraphs are mathematical structures that provide natural means for complex-data representation. Graphs capture the structure and thus help modeling a wide range of complex real-life data in various domains. Moreover graphs are especially suitable for information visualization. Indeed the intuitive visualabstraction (dots and lines) they provide is intimately associated with graphs. Visualization paves the way to interactive exploratory data-analysis and to important goals such as identifying groups and subgroups among data and helping to understand how these groups interact with each other. In this paper, we present a graph drawing approach that helps to better appreciate the cluster structure in data and the interactions that may exist between clusters. In this work, we assume that the clusters are already extracted and focus rather on the visualization aspects. We propose an energy-based model for graph drawing that produces an esthetic drawing that ensures each cluster will occupy a separate zone within thevisualization layout. This method emphasizes the inter-groups interactions and still shows the inter-nodes interactions. The drawing areas assigned to the clusters can be user-specified (prefixed areas) or automatically crafted (free areas). The approach we suggest also enables handling geographically-based clustering. In the case of free areas, we illustrate the use of our drawing method through an example. In the case of prefixed areas, we firstuse an example from citation networks and then use another exampleto compare the results of our method to those of the divide and conquer approach. In the latter case, we show that while the two methods successfully point out the cluster structure our method better visualize the global structure

    Unités d'indexation et taille des requêtes pour la recherche d'information en français

    Get PDF
    International audienceThis paper analyses different indexing method for French (lemmas, stems and truncated terms) as well as their fusing. We also examine the influence of the different section of a topic on precision. Our study uses the collections from CLEF – French monolingual from 2000 to 2005. We show that the best method is the one based on lemmas and that fuse the results obtained with the different sections of a topic.MOTS-CLÉS :recherche d'information, fusion, indexation, influence de l'indexation, recherche d'information en français.Dans cet article, nous nous intéressons à la recherche d'information en Français. Nous analysons différentes techniques d'indexation (basées sur des lemmes, des radicaux ou des termes) et leur fusion. Nous analysons également l'influence de la prise en compte des différentes parties d'une requête. Notre étude porte sur 6 campagnes d'évaluation de CLEF Français. Nous montrons que l'utilisation des lemmes et la combinaison des différentes variantes d'une requête sont les plus efficaces pour améliorer la précision moyenne et la haute précisio

    Linguistic features to predict query difficulty: a case study on previous TREC campaigns

    Get PDF
    International audienceQuery difficulty can be linked to a number of causes. Some of these causes can be related to the query expression itself, and can therefore be detected through a linguistic analysis of the query text. Using 16 different linguistic features, automatically computed on TREC queries, we looked for significant correlations between these features and the average recall and precision scores obtained by systems. Three of these features are shown to have a significant impact on either recall or precision scores for previous adhoc TREC campaigns. Each of these features can be viewed as a clue to a linguistically-specific characteristic, either morphological, syntactical or semantic. These results also open the way for a more enlightened use of linguistic processing in IR systems

    Query expansion in information retrieval : What can we learn from a deep analysis of queries ?

    Get PDF
    International audienceInformation retrieval aims at retrieving relevant documents answering a user's need expressed through a query. Users' queries are generally less than 3 words which make challenging to answer correctly. Automatic query expansion (QE) improves the precision in average even if it can decrease the results for some queries. In this paper, we propose a new automatic QE method that estimates the importance of expansion candidate terms by the strength of their relation to the query terms. The method combines local analysis and global analysis of texts. We evaluate the method using international benchmark collections and measures. We found comparable results in average compared to the Bo2 method. However, we show that a deep analysis of initial and expanded queries brings interesting insights that could help for future research in the domain

    Séparateurs à Vaste Marge pondérés en norme l2 pour la sélection de variables en apprentissage d’ordonnancement

    Get PDF
    National audienceLearning to rank algorithms are dealing with a very large amount of features to automatically learn ranking functions, which leads to an increase of both the computational cost and the number of noisy redundant features. Feature selection is seen as a promising way to address these issues. In this paper, we propose new feature selection algorithms for learning to rank based on reweighted l2 SVM approaches. We investigate a l2-AROM algorithm to solve the l0 norm optimization problem and a generic l2-reweighted algorithm to approximate l0 et l1 norm SVM problems with l2 norm SVM. Experiments show that our algorithms are up to 10 times faster and use up to 7 times less features than state-of-the-art methods, without lowering the ranking performance.Les algorithmes d’apprentissage d’ordonnancement utilisent un très grand nombre de caractéristiques pour apprendre les fonctions d’ordonnancement, entraînant une augmentation des temps d’exécution et du nombre de caractéristiques redondantes ou bruitées. La sélection de variables est une méthode prometteuse pour résoudre ces enjeux. Dans cet article, nous pro- posons de nouvelles méthodes de sélection de variables en apprentissage d’ordonnancement basées sur des approches de pondération des SVM en norme l2. Nous proposons une adap- tation d’une méthode l2-AROM pour la résolution des SVM en norme l0 et un algorithme générique de pondération de la norme l2 qui résout les problèmes en norme l0 et l1. Nos ex- périmentations montrent que les méthodes proposées sont jusqu’à 7 fois plus rapides et 10 fois plus parcimonieuses que l’état de l’art, pour des qualités d’ordonnancement équivalentes

    Learning to Choose : automatic Selection of the Information Retrieval Parameters

    Get PDF
    International audienceIn this paper we promote a selective information retrieval process to be applied in the context of repeated queries. The method is based on a training phase in which the meta search system learns the best parameters to use on a per query basis. The training phase uses a sample of annotated documents for which document relevance is known. When an equal-query is submitted to the system, it automatically knows which parameters it should use to treat the query. This Learning to choose method is evaluated using simulated data from TREC campaigns. We show that system performance highly increases in terms of precision (MAP), speci cally for the queries that are di cult to answer, when compared to any unique system con guration applied to all the queries

    Diversité de recommandations : application à une plateforme de blogs et évaluation

    Get PDF
    International audienceLes systèmes de recommandations (SR) ont pour objectif de proposer automatiquement à l'usager des objets en relation avec ses intérêts. Dans le contexte de la recherche documentaire, les intérêts de l'usager peuvent être modélisés à partir des contenus des documents visités ou des actions réalisées. Pour tendre vers des recommandations plus pertinentes, nous proposons un modèle de SR qui construit une liste de recommandations répondant à un large spectre d'intérêts potentiels. L'orignialité de notre modèle est qu'il repose sur la notion de diversité, obtenue en agrégeant différentes mesures d'intérêt pour construire la liste de recommandations finale. Nous définissons également un protocole permettant d'évaluer l'intérêt de ces recommandations. Nous présentons enfin les résultats obtenus par notre SR basé sur la diversité dans le cadre de la recommandation de billets de blogs
    corecore