Search CORE

24 research outputs found

Prédire la difficulté des requêtes : la combinaison de mesures statistiques et sémantiques

Author: Chifu Adrian-Gabriel
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceLa performance d’un Système de Recherche d’Information (SRI) est étroitement liée à la requête. Les requêtes pour lesquelles les SRI échouent sont appelées dans la littérature des « requêtes difficiles ». L’étude présentée dans cet article vise à analyser, adapater et combiner plusieurs prédicteurs de difficulté de requêtes. Nous avons considéré trois prédicteurs: un lié à l’ambiguïté des termes, un basé sur la fréquence des termes et une mesure de répartition des résultats. L’évaluation de la prédiction est basée sur la corrélation entre la difficulté prédite et la performance réelle des SRI. Nous montrons que la combinaison de ces prédicteurs donne de bons résultats. Le cadre d’évaluation est celui des collections TREC7 et TREC8 adhoc

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Performance Analysis of Information Retrieval Systems

Author: Ayter Juli
Chifu Adrian-Gabriel
Dejean Sébastien
Desclaux Cecile
Mothe Josiane
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIt has been shown that there is not a best information retrieval system configuration which would work for any query, but rather that performance can vary from one query to another. It would be interesting if a meta-system could decide which system should process a new query by learning from the context of previously submitted queries. This paper reports a deep analysis considering more than 80,000 search engine configurations applied to 100 queries and the corresponding performance. The goal of the analysis is to identify which search engine configuration responds best to a certain type of query. We considered two approaches to define query types: one is based on query clustering according to the query performance (their difficulty), while the other approach uses various query features (including query difficulty predictors) to cluster queries. We identified two parameters that should be optimized first. An important outcome is that we could not obtain strong conclusive results; considering the large number of systems and methods we used, this result could lead to the conclusion that current query features does not fit the optimizing problem

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

La prédiction efficace de la difficulté des requêtes : une tâche impossible?

Author: Chifu Adrian-Gabriel
Laporte Léa
Mothe Josiane
Publication venue: HAL CCSD
Publication date: 18/03/2015
Field of study

National audienceABSTRACT. Search engines found answers whatever the user query is, but some queries are more difficult than others for the system. For difficult queries, adhoc treatments must be applied. Predicting query difficulty is crucial and different predictors have been proposed. In this paper, we revisit these predictors. First we check the non statistical redundancy of predictors. Then, we show that the correlation between the values of predictors and system performance gives little hope on the ability of these predictors to be effective. Finally, we study the ability of predictors to predict the classes of difficulty by relying on a variety of exploratory and learning methods. We show that despite the (low) correlation with performance measures, current predictors are not robust enough to be used in practical IR applications. MOTS-CLÉS : Recherche d'information, requête difficile, prédiction, analyse de données.RÉSUMÉ. Les moteurs de recherche d'information (RI) retrouvent des réponses quelle que soit la requête, mais certaines requêtes sont difficiles (le système n'obtient pas de bonne performance en termes de mesure de RI). Pour les requêtes difficiles, des traitements adhoc doivent être ap-pliqués. Prédire qu'une requête est difficile est donc crucial et différents prédicteurs ont été proposés. Dans cet articlenous étudions la variété de l'information captée par les prédicteurs existants et donc leur non redondance. Par ailleurs, nous montrons que les corrélationsentre les prédicteurs et les performance des systèmes donnent peu d'espoir sur la capacité de ces prédic-teurs à être réellement efficaces. Enfin, nous étudions la capacité des prédicteurs à prédire les classes de difficulté des requêtes en nous appuyant sur une variété de méthodes exploratoires et d'apprentissage. Nous montrons que malgré les (faibles) corrélations observées avec les mesures de performance, les prédicteurs actuels conduisent à des performances de prédiction variables et sont donc difficilement utilisables dans une application concrète de RI

Scientific Publications of the University of Toulouse II Le Mirail

Hal-Diderot

Combining Word Embedding Interactions and LETOR Feature Evidences for Supervised QPP

Author: Datta Suchana
Ganguly Debasis
Mothe Josiane
Ullah Md Zia
Publication venue
Publication date: 31/03/2023
Field of study

In information retrieval, query performance prediction aims to predict whether a search engine is likely to succeed in retrieving potentially relevant documents to a user’s query. This problem is usually cast into a regression problem where a machine should predict the effectiveness (in terms of an information retrieval measure) of the search engine on a given query. The solutions range from simple unsupervised approaches where a single source of information (e.g., the variance of the retrieval similarity scores in NQC), predicts the search engine effectiveness for a given query, to more involved ones that rely on supervised machine learning making use of several sources of information, e.g., the learning to rank (LETOR) features, word embedding similarities etc. In this paper, we investigate the combination of two different types of evidences into a single neural network model. While our first source of information corresponds to the semantic interaction between the terms in queries and their top-retrieved documents, our second source of information corresponds to that of LETOR features

Enlighten

On the Feasibility and Robustness of Pointwise Evaluation of Query Performance Prediction

Author: Datta Suchana
Ganguly Debasis
Greene Derek
Mitra Mandar
Publication venue
Publication date: 31/03/2023
Field of study

Despite the retrieval effectiveness of queries being mutually independent of one another, the evaluation of query performance prediction (QPP) systems has been carried out by measuring rank correlation over an entire set of queries. Such a listwise approach has a number of disadvantages, notably that it does not support the common requirement of assessing QPP for individual queries. In this paper, we propose a pointwise QPP framework that allows us to evaluate the quality of a QPP system for individual queries by measuring the deviations between each prediction versus the corresponding true value, and then aggregating the results over a set of queries. Our experiments demonstrate that this new approach leads to smaller variances in QPP evaluations across a range of different target metrics and retrieval models

Enlighten

Intent-aware search result diversification

Author: Macdonald C.
Ounis I.
Santos R.L.T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches

CiteSeerX

Crossref

Enlighten

Predicting IR Personalization Performance using Pre-retrieval Query Predictors

Author: Campos Ibáñez Luis Miguel
Fernández Luna Juan Manuel
Huete Juan F.
Vicente-López Eduardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/01/2018
Field of study

Personalization generally improves the performance of queries but in a few cases it may also harms it. If we are able to predict and therefore to disable personalization for those situations, the overall performance will be higher and users will be more satisfied with personalized systems. We use some state-of-the-art pre-retrieval query performance predictors and propose some others including the user profile information for the previous purpose. We study the correlations among these predictors and the difference between the personalized and the original queries. We also use classification and regression techniques to improve the results and finally reach a bit more than one third of the maximum ideal performance. We think this is a good starting point within this research line, which certainly needs more effort and improvements.This work has been supported by the Spanish Andalusian “Consejerı́a de Innovación, Ciencia y Empresa” postdoctoral phase of project P09-TIC-4526, the Spanish “Ministerio de Economı́a y Competitividad” projects TIN2013-42741-P and TIN2016-77902-C3-2-P, and the European Regional Development Fund (ERDF-FEDER)

Crossref

Repositorio Institucional Universidad de Granada

Predicting IR Personalization Performance using Pre-retrieval Query Predictors

Author: de Campos Luis M.
Fernández-Luna Juan M.
Huete Juan F.
Vicente-López Eduardo
Publication venue
Publication date: 24/01/2024
Field of study

arXiv.org e-Print Archive