6 research outputs found
Evaluating Information Retrieval and Access Tasks
This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one
Filtering News from Document Streams: Evaluation Aspects and Modeled Stream Utility
Events like hurricanes, earthquakes,
or accidents can impact a large number of people. Not only are people in the
immediate vicinity of the event affected, but concerns about their well-being are
shared by the local government and well-wishers across the world.
The latest information about news events
could be of use to government and aid agencies in order to make informed decisions on
providing necessary support, security and relief. The general public
avails of news updates via dedicated news feeds or broadcasts, and lately,
via social media services
like Facebook or Twitter.
Retrieving the latest information about newsworthy events from the world-wide web
is thus of importance to a large section of society.
As new content on a multitude of topics is continuously being published on the web,
specific event related information needs to be filtered from the resulting
stream of documents.
We present in this thesis, a user-centric evaluation measure for
evaluating systems that filter news related information from document streams.
Our proposed evaluation measure, Modeled Stream Utility (MSU), models
users accessing information from a stream of sentences
produced by a news update filtering system.
The user model allows for simulating a large number of users with different
characteristic stream browsing behavior. Through simulation,
MSU estimates the utility of a system for an
average user browsing a stream of sentences.
Our results show that system performance is sensitive to a user population's
stream browsing behavior and that
existing evaluation metrics correspond to very specific types of user behavior.
To evaluate systems that filter sentences from a document stream,
we need a set of judged sentences. This judged set is
a subset of all the sentences returned by all systems, and is
typically constructed by pooling
together the highest quality sentences,
as determined by respective system assigned scores for each sentence.
Sentences in the pool are manually assessed and
the resulting set of judged sentences is then used to compute system performance metrics.
In this thesis, we investigate the effect of including duplicates of
judged sentences, into the judged set, on system performance evaluation. We also develop an
alternative pooling methodology, that given the MSU user model,
selects sentences for pooling based on the probability of a sentences being read by
modeled users.
Our research lays the foundation for interesting future work for utilizing
user-models in different aspects of evaluation of stream filtering systems.
The MSU measure enables incorporation of different
user models. Furthermore, the applicability of MSU could be extended through
calibration based on user
behavior
Définition et évaluation de modèles d'agrégation pour l'estimation de la pertinence multidimensionnelle en recherche d'information
The main research topic of this document revolve around the information retrieval (IR) field. Traditional IR models rank documents by computing single scores separately with respect to one single objective criterion. Recently, an increasing number of IR studies has triggered a resurgence of interest in redefining the algorithmic estimation of relevance, which implies a shift from topical to multidimensional relevance assessment.In our work, we specifically address the multidimensional relevance assessment and evaluation problems. To tackle this challenge, state-of-the-art approaches are often based on linear combination mechanisms. However, However, these methods rely on the unrealistic additivity hypothesis and independence of the relevance dimensions, which makes it unsuitable in many real situations where criteria are correlated.Other techniques from the machine learning area have also been proposed. The latter learn a model from example inputs and generalize it to combine the different criteria. Nonetheless, these methods tend to offer only limited insight on how to consider the importance and the interaction between the criteria. In addition to the parameters sensitivity used within these algorithms, it is quite difficult to understand why a criteria is more preferred over another one.To address this problem, we proposed a model based on a multi-criteria aggregation operator that is able to overcome the problem of additivity. Our model is based on a fuzzy measure that offer semantic interpretations of the correlations and interactions between the criteria. We have adapted this model to the multidimensional relevance estimation in two scenarii: (i) a tweet search task and (ii) two personalized IR settings. The second line of research focuses on the integration of the temporal factor in the aggregation process, in order to consider the changes of document collections over time. To do so, we have proposed a time-aware IR model for combining the temporal relavance criterion with the topical relevance one. Then, we performed a time series analysis to identify the temporal query nature, and we proposed an evaluation framework within a time-aware IR setting.La problématique générale de notre travail s'inscrit dans le domaine scientifique de la recherche d'information (RI). Les modèles de RI classiques sont généralement basés sur une définition de la notion de pertinence qui est liée essentiellement à l'adéquation thématique entre le sujet de la requête et le sujet du document. Le concept de pertinence a été revisité selon différents niveaux intégrant ainsi différents facteurs liés à l'utilisateur et à son environnement dans une situation de RI. Dans ce travail, nous abordons spécifiquement le problème lié à la modélisation de la pertinence multidimensionnelle à travers la définition de nouveaux modèles d'agrégation des critères et leur évaluation dans des tâches de recherche de RI. Pour répondre à cette problématique, les travaux de l'état de l'art se basent principalement sur des combinaisons linéaires simples. Cependant, ces méthodes se reposent sur l'hypothèse non réaliste d'additivité ou d'indépendance des dimensions, ce qui rend le modèle non approprié dans plusieurs situations de recherche réelles dans lesquelles les critères étant corrélés ou présentant des interactions entre eux. D'autres techniques issues du domaine de l'apprentissage automatique ont été aussi proposées, permettant ainsi d'apprendre un modèle par l'exemple et de le généraliser dans l'ordonnancement et l'agrégation des critères. Toutefois, ces méthodes ont tendance à offrir un aperçu limité sur la façon de considérer l'importance et l'interaction entre les critères. En plus de la sensibilité des paramètres utilisés dans ces algorithmes, est très difficile de comprendre pourquoi un critère est préféré par rapport à un autre. Pour répondre à cette première direction de recherche, nous avons proposé un modèle de combinaison de pertinence multicritères basé sur un opérateur d'agrégation qui permet de surmonter le problème d'additivité des fonctions de combinaison classiques. Notre modèle se base sur une mesure qui permet de donner une idée plus claire sur les corrélations et interactions entre les critères. Nous avons ainsi adapté ce modèle pour deux scénarios de combinaison de pertinence multicritères : (i) un cadre de recherche d'information multicritères dans un contexte de recherche de tweets et (ii) deux cadres de recherche d'information personnalisée. Le deuxième axe de recherche s'intéresse à l'intégration du facteur temporel dans le processus d'agrégation afin de tenir compte des changements occurrents sur les collection de documents au cours du temps. Pour ce faire, nous avons proposé donc un modèle d'agrégation sensible au temps pour combinant le facteur temporel avec le facteur de pertinence thématique. Dans cet objectif, nous avons effectué une analyse temporelle pour éliciter l'aspect temporel des requêtes, et nous avons proposé une évaluation de ce modèle dans une tâche de recherche sensible au temps
GIR at the NTCIR-12 Temporalia Task
ABSTRACT The GIR team participated in the NTCIR 12 Temporal Information Access (Temporalia) Task. This report describes our approach to solving the Temporal Intent Disambiguation (TID) problem and discusses the official results. We explore the rich temporal information in the labeled and unlabeled search queries. A semi-supervised linear classifiers is then built up to predict the temporal classes for each search query. Team Name GI