9 research outputs found
When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections
This is the peer reviewed version of the following article: David E. Losada, Javier Parapar and Alvaro Barreiro (2019) When to Stop Making Relevance Judgments? A Study of Stopping Methods for Building Information Retrieval Test Collections. Journal of the Association for Information Science and Technology, 70 (1), 49-60, which has been published in final form at https://doi.org/10.1002/asi.24077. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived VersionsIn information retrieval evaluation, pooling is a well‐known technique to extract a sample of documents to be assessed for relevance. Given the pooled documents, a number of studies have proposed different prioritization methods to adjudicate documents for judgment. These methods follow different strategies to reduce the assessment effort. However, there is no clear guidance on how many relevance judgments are required for creating a reliable test collection. In this article we investigate and further develop methods to determine when to stop making relevance judgments. We propose a highly diversified set of stopping methods and provide a comprehensive analysis of the usefulness of the resulting test collections. Some of the stopping methods introduced here combine innovative estimates of recall with time series models used in Financial Trading. Experimental results on several representative collections show that some stopping methods can reduce up to 95% of the assessment effort and still produce a robust test collection. We demonstrate that the reduced set of judgments can be reliably employed to compare search systems using disparate effectiveness metrics such as Average Precision, NDCG, P@100, and Rank Biased Precision. With all these measures, the correlations found between full pool rankings and reduced pool rankings is very highThis work received financial support from the (i) “Ministerio de Economía y Competitividad” of the Government of Spain and FEDER Funds under the researchproject TIN2015-64282-R, (ii) Xunta de Galicia (project GPC 2016/035), and (iii) Xunta de Galicia “Consellería deCultura, Educación e Ordenación Universitaria” and theEuropean Regional Development Fund (ERDF) throughthe following 2016–2019 accreditations: ED431G/01(“Centro singular de investigación de Galicia”) andED431G/08S
Study of result presentation and interaction for aggregated search
The World Wide Web has always attracted researchers and commercial search engine companies due to the enormous amount of information available on it. "Searching" on web has become an integral part of today's world, and many people rely on it when looking for information. The amount and the diversity of information available on the Web has also increased dramatically. Due to which, the researchers and the search engine companies are making constant efforts in order to make this information accessible to the people effectively.
Not only there is an increase in the amount and diversity of information available online, users are now often seeking information on broader topics. Users seeking information on broad topics, gather information from various information sources (e.g, image, video, news, blog, etc). For such information requests, not only web results but results from different document genre and multimedia contents are also becoming relevant. For instance, users' looking for information on "Glasgow" might be interested in web results about Glasgow, Map of Glasgow, Images of Glasgow, News of Glasgow, and so on.
Aggregated search aims to provide access to this diverse information in a unified manner by aggregating results from different information sources on a single result page. Hence making information gathering process easier for broad topics.
This thesis aims to explore the aggregated search from the users' perspective. The thesis first and foremost focuses on understanding and describing the phenomena related to the users' search process in the context of the aggregated search. The goal is to participate in building theories and in understanding constraints, as well as providing insights into the interface design space. In building this understanding, the thesis focuses on the click-behavior, information need, source relevance, dynamics of search intents. The understanding comes partly from conducting users studies and, from analyzing search engine log data.
While the thematic (or topical) relevance of documents is important, this thesis argues that the "source type" (source-orientation) may also be an important dimension in the relevance space for investigating in aggregated search. Therefore, relevance is multi-dimensional (topical and source-orientated) within the context of aggregated search. Results from the study suggest that the effect of the source-orientation was a significant factor in an aggregated search scenario. Hence adds another dimension to the relevance space within the aggregated search scenario.
The thesis further presents an effective method which combines rule base and machine learning techniques to identify source-orientation behind a user query.
Furthermore, after analyzing log-data from a search engine company and conducting user study experiments, several design issues that may arise with respect to the aggregated search interface are identified. In order to address these issues, suitable design guidelines that can be beneficial from the interface perspective are also suggested.
To conclude, aim of this thesis is to explore the emerging aggregated search from users' perspective, since it is a very important for front-end technologies. An additional goal is to provide empirical evidence for influence of aggregated search on users searching behavior, and identify some of the key challenges of aggregated search. During this work several aspects of aggregated search will be uncovered. Furthermore, this thesis will provide a foundations for future research in aggregated search and will highlight the potential research directions
Index ordering by query-independent measures
There is an ever-increasing amount of data that is being produced from various data sources — this data must then be organised effectively if we hope to search though it. Traditional information retrieval approaches search through all available data in a particular collection in order to find the most suitable results, however, for particularly large collections this may be extremely time consuming.
Our purposed solution to this problem is to only search a limited amount of the collection at query-time, in order to speed this retrieval process up. Although, in doing this we aim to limit the loss in retrieval efficacy (in terms of accuracy of results). The way we aim to do this is to firstly identify the most “important” documents within the collection, and then sort the documents within the collection in order of their "importance” in the collection. In this way we can choose to limit the amount of information to search through, by eliminating the documents of lesser importance, which should not only make the search more efficient, but should also limit any loss in retrieval accuracy.
In this thesis we investigate various different query-independent methods that may indicate the importance of a document in a collection. The more accurate the measure is at determining an important document, the more effectively we can eliminate documents from the retrieval process - improving the query-throughput of the system, as well as providing a high level of accuracy in the returned results. The effectiveness of these approaches are evaluated using the datasets provided by the terabyte track at the Text REtreival Conference (TREC)
Filtering News from Document Streams: Evaluation Aspects and Modeled Stream Utility
Events like hurricanes, earthquakes,
or accidents can impact a large number of people. Not only are people in the
immediate vicinity of the event affected, but concerns about their well-being are
shared by the local government and well-wishers across the world.
The latest information about news events
could be of use to government and aid agencies in order to make informed decisions on
providing necessary support, security and relief. The general public
avails of news updates via dedicated news feeds or broadcasts, and lately,
via social media services
like Facebook or Twitter.
Retrieving the latest information about newsworthy events from the world-wide web
is thus of importance to a large section of society.
As new content on a multitude of topics is continuously being published on the web,
specific event related information needs to be filtered from the resulting
stream of documents.
We present in this thesis, a user-centric evaluation measure for
evaluating systems that filter news related information from document streams.
Our proposed evaluation measure, Modeled Stream Utility (MSU), models
users accessing information from a stream of sentences
produced by a news update filtering system.
The user model allows for simulating a large number of users with different
characteristic stream browsing behavior. Through simulation,
MSU estimates the utility of a system for an
average user browsing a stream of sentences.
Our results show that system performance is sensitive to a user population's
stream browsing behavior and that
existing evaluation metrics correspond to very specific types of user behavior.
To evaluate systems that filter sentences from a document stream,
we need a set of judged sentences. This judged set is
a subset of all the sentences returned by all systems, and is
typically constructed by pooling
together the highest quality sentences,
as determined by respective system assigned scores for each sentence.
Sentences in the pool are manually assessed and
the resulting set of judged sentences is then used to compute system performance metrics.
In this thesis, we investigate the effect of including duplicates of
judged sentences, into the judged set, on system performance evaluation. We also develop an
alternative pooling methodology, that given the MSU user model,
selects sentences for pooling based on the probability of a sentences being read by
modeled users.
Our research lays the foundation for interesting future work for utilizing
user-models in different aspects of evaluation of stream filtering systems.
The MSU measure enables incorporation of different
user models. Furthermore, the applicability of MSU could be extended through
calibration based on user
behavior
Increasing the Efficiency of High-Recall Information Retrieval
The goal of high-recall information retrieval (HRIR) is to find all,
or nearly all, relevant documents while maintaining reasonable assessment effort.
Achieving high recall is a key problem in the use of applications such as
electronic discovery, systematic review, and construction of test collections for
information retrieval tasks. State-of-the-art HRIR systems commonly rely on iterative relevance feedback in which
human assessors continually assess machine learning-selected documents.
The relevance of the assessed documents is then fed back to
the machine learning model to improve its ability to select the next set of
potentially relevant documents for assessment. In many instances, thousands of human assessments might be required to achieve high recall. These assessments represent the main cost of such HRIR
applications. Therefore, their effectiveness in achieving high recall
is limited by their reliance on human input when assessing the relevance of
documents. In this thesis, we test different methods in order to improve the effectiveness and
efficiency of finding relevant documents using state-of-the-art HRIR
system. With regard to the effectiveness, we try to build a machine-learned
model that retrieves relevant documents more accurately.
For efficiency, we try to help human assessors make
relevance assessments more easily and quickly via our HRIR system.
Furthermore, we try to establish a stopping criteria for the
assessment process so as to avoid excessive assessment.
In particular, we hypothesize that total assessment effort to achieve high
recall can be reduced by using shorter document excerpts
(e.g., extractive summaries) in place of full documents for the assessment of
relevance and using a high-recall retrieval system based on continuous active
learning (CAL). In order to test this hypothesis, we implemented a
high-recall retrieval system based on state-of-the-art implementation of CAL. This high-recall retrieval system could display
either full documents or short document excerpts for relevance assessment.
A search engine was also integrated into our system to provide
assessors the option of conducting interactive search and judging.
We conducted a simulation study, and separately, a 50-person controlled user study to test our hypothesis.
The results of the simulation study show that judging even a single
extracted sentence for relevance feedback may be adequate for CAL
to achieve high recall. The results of the controlled user study
confirmed that human assessors were able to find
a significantly larger number of relevant documents within limited time when they used the
system with paragraph-length document excerpts as opposed to full documents.
In addition, we found that allowing participants to compose and execute their
own search queries did not improve their ability to find relevant
documents and, by some measures, impaired performance.
Moreover, integrating sampling methods with active
learning can yield accurate estimates of the number of relevant documents, and thus avoid excessive assessments
Predição de relevância em sistemas de recuperação de informação
Orientador: Anderson de Rezende RochaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: No mundo conectado atual, Recuperação de Informação (IR) tem se tornado um campo de pesquisa de crescente interesse, sendo um problema presente em muitas aplicações modernas. Dentre os muitos desafios no desenvolvimento the sistemas de IR está uma correta avaliação de performance desses sistemas. Avaliação \emph{offline}, entretanto, se limita na maioria dos casos ao \emph{benchamark} e comparação de performance entre diferentes sistemas. Esse fato levou ao surgimento do problema denomidado Predição de Performance de Consulta (QPP), cujo objetivo é estimar, em tempo de consulta, a qualidade dos resultados obtidos. Nos últimos anos, QPP recebeu grande atenção na literatura, sobretudo no contexto de busca textual. Ainda assim, QPP também tem suas limitações, principalmente por ser uma maneira indireta de estimar a performance de sistemas de IR. Nessa tese, investigamos formular o problema de QPP como um problema de \emph{predição de relevância}: a tarefa de predizer, para um determinado \topk, quais resultados de uma consulta são de fato relevantes para ela, de acordo com uma referência de relevância existente. Apesar de notavelmente desafiador, predição de relevância é não só uma maneira mais natural de estimar performance, como também com diversas aplicações. Nessa tese, apresentamos três famílias de métodos de predição de relevância: estatísticos, aprendizado, e rotulação sequencial. Todos os métodos nessas famílias tiveram sua efetividade avaliada em diversos experimentos em recuperação de imagens por conteúdo, cobrindo uma vasta gama de conjuntos de dados de grande-escala, assim como diferentes configurações de recuperação. Mostramos que é possível gerar predições de relevância precisas, para grandes valores de , não só connhecendo pouco do sistema de IR analisado, como também de forma eficiente o bastante para ser aplicável em tempo de consulta. Finalizamos esta tese discutindo alguns caminhos possíveis para melhorar os resultados obtidos, assim como trabalhos futuros nesse campo de pesquisaAbstract: In today¿s connected world, Information Retrieval (IR) has become one of the most ubiquitous problems, being part of many modern applications. Among all challenges in designing IR systems, how to evaluate their performance is ever-present. Offline evaluation, however, is mostly limited to benchmarking and comparison of different systems, which has pushed a growing interest in predicting, at query time, the performance of an IR system. Query Performance Prediction (QPP) is the name given to the problem of estimating the quality of results retrieved by an IR system in response to a query. In the past few years, this problem received much attention, especially by the text retrieval community. Yet, QPP is still limited as only an indirect way of estimating the performance of IR systems. In this thesis, we investigate formulating the QPP problem as a \emph{relevance prediction} one: the task of predicting, for a specific \topk, which results of a query are relevant to it, according to some existing relevance reference. Though remarkably challenging, relevance prediction is not only a more natural way of predicting performance but also one with significantly more applications. In this thesis, we present three families of relevance prediction approaches: statistical, learning, and sequence labeling. All methods within those families are evaluated concerning their effectiveness in several content-based image retrieval experiments, covering several large-scale datasets and retrieval settings. The experiments in this thesis show that it is feasible to perform relevance prediction for values as large as 30, with minimal information about the underlying IR system, and efficiently enough to be performed at query time. This thesis is concluded by offering some potential paths for improving the current results, as well as future research in this particular fieldDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação168326/2017-5CAPESCNP
Understanding search
This thesis provides a framework for information retrieval based on a set of models which together
illustrate how users of search engines come to express their needs in a particular way. With such
insights, we may be able to improve systems’ capabilities of understanding users’ requests and through
that eventually the ability to satisfy their needs. Developing the framework necessitates discussion of
context, relevance, need development, and the cybernetics of search, all of which are controversial
topics. Transaction log data from two enterprise search engines are analysed using a specially
developed method which classifies queries according to what aspect of the need they refer to