24 research outputs found

    PrĂ©dire la difficultĂ© des requĂȘtes : la combinaison de mesures statistiques et sĂ©mantiques

    Get PDF
    National audienceLa performance d’un SystĂšme de Recherche d’Information (SRI) est Ă©troitement liĂ©e Ă  la requĂȘte. Les requĂȘtes pour lesquelles les SRI Ă©chouent sont appelĂ©es dans la littĂ©rature des « requĂȘtes difficiles ». L’étude prĂ©sentĂ©e dans cet article vise Ă  analyser, adapater et combiner plusieurs prĂ©dicteurs de difficultĂ© de requĂȘtes. Nous avons considĂ©rĂ© trois prĂ©dicteurs: un liĂ© Ă  l’ambiguĂŻtĂ© des termes, un basĂ© sur la frĂ©quence des termes et une mesure de rĂ©partition des rĂ©sultats. L’évaluation de la prĂ©diction est basĂ©e sur la corrĂ©lation entre la difficultĂ© prĂ©dite et la performance rĂ©elle des SRI. Nous montrons que la combinaison de ces prĂ©dicteurs donne de bons rĂ©sultats. Le cadre d’évaluation est celui des collections TREC7 et TREC8 adhoc

    Performance Analysis of Information Retrieval Systems

    Get PDF
    International audienceIt has been shown that there is not a best information retrieval system configuration which would work for any query, but rather that performance can vary from one query to another. It would be interesting if a meta-system could decide which system should process a new query by learning from the context of previously submitted queries. This paper reports a deep analysis considering more than 80,000 search engine configurations applied to 100 queries and the corresponding performance. The goal of the analysis is to identify which search engine configuration responds best to a certain type of query. We considered two approaches to define query types: one is based on query clustering according to the query performance (their difficulty), while the other approach uses various query features (including query difficulty predictors) to cluster queries. We identified two parameters that should be optimized first. An important outcome is that we could not obtain strong conclusive results; considering the large number of systems and methods we used, this result could lead to the conclusion that current query features does not fit the optimizing problem

    La prĂ©diction efficace de la difficultĂ© des requĂȘtes : une tĂąche impossible?

    Get PDF
    National audienceABSTRACT. Search engines found answers whatever the user query is, but some queries are more difficult than others for the system. For difficult queries, adhoc treatments must be applied. Predicting query difficulty is crucial and different predictors have been proposed. In this paper, we revisit these predictors. First we check the non statistical redundancy of predictors. Then, we show that the correlation between the values of predictors and system performance gives little hope on the ability of these predictors to be effective. Finally, we study the ability of predictors to predict the classes of difficulty by relying on a variety of exploratory and learning methods. We show that despite the (low) correlation with performance measures, current predictors are not robust enough to be used in practical IR applications. MOTS-CLÉS : Recherche d'information, requĂȘte difficile, prĂ©diction, analyse de donnĂ©es.RÉSUMÉ. Les moteurs de recherche d'information (RI) retrouvent des rĂ©ponses quelle que soit la requĂȘte, mais certaines requĂȘtes sont difficiles (le systĂšme n'obtient pas de bonne performance en termes de mesure de RI). Pour les requĂȘtes difficiles, des traitements adhoc doivent ĂȘtre ap-pliquĂ©s. PrĂ©dire qu'une requĂȘte est difficile est donc crucial et diffĂ©rents prĂ©dicteurs ont Ă©tĂ© proposĂ©s. Dans cet articlenous Ă©tudions la variĂ©tĂ© de l'information captĂ©e par les prĂ©dicteurs existants et donc leur non redondance. Par ailleurs, nous montrons que les corrĂ©lationsentre les prĂ©dicteurs et les performance des systĂšmes donnent peu d'espoir sur la capacitĂ© de ces prĂ©dic-teurs Ă  ĂȘtre rĂ©ellement efficaces. Enfin, nous Ă©tudions la capacitĂ© des prĂ©dicteurs Ă  prĂ©dire les classes de difficultĂ© des requĂȘtes en nous appuyant sur une variĂ©tĂ© de mĂ©thodes exploratoires et d'apprentissage. Nous montrons que malgrĂ© les (faibles) corrĂ©lations observĂ©es avec les mesures de performance, les prĂ©dicteurs actuels conduisent Ă  des performances de prĂ©diction variables et sont donc difficilement utilisables dans une application concrĂšte de RI

    Combining Word Embedding Interactions and LETOR Feature Evidences for Supervised QPP

    Get PDF
    In information retrieval, query performance prediction aims to predict whether a search engine is likely to succeed in retrieving potentially relevant documents to a user’s query. This problem is usually cast into a regression problem where a machine should predict the effectiveness (in terms of an information retrieval measure) of the search engine on a given query. The solutions range from simple unsupervised approaches where a single source of information (e.g., the variance of the retrieval similarity scores in NQC), predicts the search engine effectiveness for a given query, to more involved ones that rely on supervised machine learning making use of several sources of information, e.g., the learning to rank (LETOR) features, word embedding similarities etc. In this paper, we investigate the combination of two different types of evidences into a single neural network model. While our first source of information corresponds to the semantic interaction between the terms in queries and their top-retrieved documents, our second source of information corresponds to that of LETOR features

    On the Feasibility and Robustness of Pointwise Evaluation of Query Performance Prediction

    Get PDF
    Despite the retrieval effectiveness of queries being mutually independent of one another, the evaluation of query performance prediction (QPP) systems has been carried out by measuring rank correlation over an entire set of queries. Such a listwise approach has a number of disadvantages, notably that it does not support the common requirement of assessing QPP for individual queries. In this paper, we propose a pointwise QPP framework that allows us to evaluate the quality of a QPP system for individual queries by measuring the deviations between each prediction versus the corresponding true value, and then aggregating the results over a set of queries. Our experiments demonstrate that this new approach leads to smaller variances in QPP evaluations across a range of different target metrics and retrieval models

    Intent-aware search result diversification

    Full text link
    Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches

    Predicting IR Personalization Performance using Pre-retrieval Query Predictors

    Get PDF
    Personalization generally improves the performance of queries but in a few cases it may also harms it. If we are able to predict and therefore to disable personalization for those situations, the overall performance will be higher and users will be more satisfied with personalized systems. We use some state-of-the-art pre-retrieval query performance predictors and propose some others including the user profile information for the previous purpose. We study the correlations among these predictors and the difference between the personalized and the original queries. We also use classification and regression techniques to improve the results and finally reach a bit more than one third of the maximum ideal performance. We think this is a good starting point within this research line, which certainly needs more effort and improvements.This work has been supported by the Spanish Andalusian “Consejerı́a de InnovaciĂłn, Ciencia y Empresa” postdoctoral phase of project P09-TIC-4526, the Spanish “Ministerio de Economı́a y Competitividad” projects TIN2013-42741-P and TIN2016-77902-C3-2-P, and the European Regional Development Fund (ERDF-FEDER)

    Predicting IR Personalization Performance using Pre-retrieval Query Predictors

    Full text link
    Personalization generally improves the performance of queries but in a few cases it may also harms it. If we are able to predict and therefore to disable personalization for those situations, the overall performance will be higher and users will be more satisfied with personalized systems. We use some state-of-the-art pre-retrieval query performance predictors and propose some others including the user profile information for the previous purpose. We study the correlations among these predictors and the difference between the personalized and the original queries. We also use classification and regression techniques to improve the results and finally reach a bit more than one third of the maximum ideal performance. We think this is a good starting point within this research line, which certainly needs more effort and improvements
    corecore