207,347 research outputs found
Evaluation Measures for Relevance and Credibility in Ranked Lists
Recent discussions on alternative facts, fake news, and post truth politics
have motivated research on creating technologies that allow people not only to
access information, but also to assess the credibility of the information
presented to them by information retrieval systems. Whereas technology is in
place for filtering information according to relevance and/or credibility, no
single measure currently exists for evaluating the accuracy or precision (and
more generally effectiveness) of both the relevance and the credibility of
retrieved results. One obvious way of doing so is to measure relevance and
credibility effectiveness separately, and then consolidate the two measures
into one. There at least two problems with such an approach: (I) it is not
certain that the same criteria are applied to the evaluation of both relevance
and credibility (and applying different criteria introduces bias to the
evaluation); (II) many more and richer measures exist for assessing relevance
effectiveness than for assessing credibility effectiveness (hence risking
further bias).
Motivated by the above, we present two novel types of evaluation measures
that are designed to measure the effectiveness of both relevance and
credibility in ranked lists of retrieval results. Experimental evaluation on a
small human-annotated dataset (that we make freely available to the research
community) shows that our measures are expressive and intuitive in their
interpretation
An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric
Many evaluation metrics have been defined to evaluate the effectiveness
ad-hoc retrieval and search result diversification systems. However, it is
often unclear which evaluation metric should be used to analyze the performance
of retrieval systems given a specific task. Axiomatic analysis is an
informative mechanism to understand the fundamentals of metrics and their
suitability for particular scenarios. In this paper, we define a
constraint-based axiomatic framework to study the suitability of existing
metrics in search result diversification scenarios. The analysis informed the
definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known
Rank-Biased Precision metric -- that takes into account redundancy and the user
effort associated to the inspection of documents in the ranking. Our
experiments over standard diversity evaluation campaigns show that the proposed
metric captures quality criteria reflected by different metrics, being suitable
in the absence of knowledge about particular features of the scenario under
study.Comment: Original version: 10 pages. Preprint of full paper to appear at
SIGIR'18: The 41st International ACM SIGIR Conference on Research &
Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA.
ACM, New York, NY, US
Optimizing Ranking Measures for Compact Binary Code Learning
Hashing has proven a valuable tool for large-scale information retrieval.
Despite much success, existing hashing methods optimize over simple objectives
such as the reconstruction error or graph Laplacian related loss functions,
instead of the performance evaluation criteria of interest---multivariate
performance measures such as the AUC and NDCG. Here we present a general
framework (termed StructHash) that allows one to directly optimize multivariate
performance measures. The resulting optimization problem can involve
exponentially or infinitely many variables and constraints, which is more
challenging than standard structured output learning. To solve the StructHash
optimization problem, we use a combination of column generation and
cutting-plane techniques. We demonstrate the generality of StructHash by
applying it to ranking prediction and image retrieval, and show that it
outperforms a few state-of-the-art hashing methods.Comment: Appearing in Proc. European Conference on Computer Vision 201
Unsupervised Graph-based Rank Aggregation for Improved Retrieval
This paper presents a robust and comprehensive graph-based rank aggregation
approach, used to combine results of isolated ranker models in retrieval tasks.
The method follows an unsupervised scheme, which is independent of how the
isolated ranks are formulated. Our approach is able to combine arbitrary
models, defined in terms of different ranking criteria, such as those based on
textual, image or hybrid content representations.
We reformulate the ad-hoc retrieval problem as a document retrieval based on
fusion graphs, which we propose as a new unified representation model capable
of merging multiple ranks and expressing inter-relationships of retrieval
results automatically. By doing so, we claim that the retrieval system can
benefit from learning the manifold structure of datasets, thus leading to more
effective results. Another contribution is that our graph-based aggregation
formulation, unlike existing approaches, allows for encapsulating contextual
information encoded from multiple ranks, which can be directly used for
ranking, without further computations and post-processing steps over the
graphs. Based on the graphs, a novel similarity retrieval score is formulated
using an efficient computation of minimum common subgraphs. Finally, another
benefit over existing approaches is the absence of hyperparameters.
A comprehensive experimental evaluation was conducted considering diverse
well-known public datasets, composed of textual, image, and multimodal
documents. Performed experiments demonstrate that our method reaches top
performance, yielding better effectiveness scores than state-of-the-art
baseline methods and promoting large gains over the rankers being fused, thus
demonstrating the successful capability of the proposal in representing queries
based on a unified graph-based model of rank fusions
Design criteria for a PC-based common user interface to remote information systems
A set of design criteria are presented which will allow the implementation of an interface to multiple remote information systems on a microcomputer. The focus of the design description is on providing the user with the functionality required to retrieve, store and manipulate data residing in remote information systems through the utilization of a standardized interface system. The intent is to spare the user from learning the details of retrieval from specific systems while retaining the full capabilities of each system. The system design includes multi-level capabilities to enhance usability by a wide range of users and utilizes microcomputer graphics capabilities where applicable. A data collection subsystem for evaluation purposes is also described
Special requirements for comparative evaluation of web search engines
ABSTRACT: Performance evaluation of classical information retrieval systems usually aims to assess the ability of these systems to find documents considered relevant to a certain search query based on a specific evaluation criteria. This approach, however, is not suitable to adequately evaluate some information retrieval applications such as web search engines. The web special characteristics make information retrieval tasks and the evaluation of search engines on the web face multiple challenges. Different web-specific, user-specific and language-specific requirements should be considered when designing and performing evaluation tests on operational web search engines. This paper discusses the special requirements for comprehensive comparative evaluation of different web search engines and highlights some languagespecific considerations for evaluation in Arabic language
Symbolic Melodic Similarity: State of the Art and Future Challenges
Fostered by the introduction of the Music Information Retrieval Evaluation eXchange (MIREX) competition, the number of systems which calculate Symbolic Melodic Similarity has recently increased considerably. In order to understand the state of the art, we provide a comparative analysis of existing algorithms. The analysis is based on eight criteria that help characterising the systems, and highlighting strengths and weaknesses. We also propose a taxonomy which classifies algorithms based on their approach. Both taxonomy and criteria are fruitfully exploited for providing input for new forthcoming research in the area
Application of aboutness to functional benchmarking in information retrieval
Experimental approaches are widely employed to benchmark the performance of an information retrieval (IR) system. Measurements in terms of recall and precision are computed as performance indicators. Although they are good at assessing the retrieval effectiveness of an IR system, they fail to explore deeper aspects such as its underlying functionality and explain why the system shows such performance. Recently, inductive (i.e., theoretical) evaluation of IR systems has been proposed to circumvent the controversies of the experimental methods. Several studies have adopted the inductive approach, but they mostly focus on theoretical modeling of IR properties by using some metalogic. In this article, we propose to use inductive evaluation for functional benchmarking of IR models as a complement of the traditional experiment-based performance benchmarking. We define a functional benchmark suite in two stages: the evaluation criteria based on the notion of "aboutness," and the formal evaluation methodology using the criteria. The proposed benchmark has been successfully applied to evaluate various well-known classical and logic-based IR models. The functional benchmarking results allow us to compare and analyze the functionality of the different IR models
An Intent Taxonomy of Legal Case Retrieval
Legal case retrieval is a special Information Retrieval~(IR) task focusing on
legal case documents. Depending on the downstream tasks of the retrieved case
documents, users' information needs in legal case retrieval could be
significantly different from those in Web search and traditional ad-hoc
retrieval tasks. While there are several studies that retrieve legal cases
based on text similarity, the underlying search intents of legal retrieval
users, as shown in this paper, are more complicated than that yet mostly
unexplored. To this end, we present a novel hierarchical intent taxonomy of
legal case retrieval. It consists of five intent types categorized by three
criteria, i.e., search for Particular Case(s), Characterization, Penalty,
Procedure, and Interest. The taxonomy was constructed transparently and
evaluated extensively through interviews, editorial user studies, and query log
analysis. Through a laboratory user study, we reveal significant differences in
user behavior and satisfaction under different search intents in legal case
retrieval. Furthermore, we apply the proposed taxonomy to various downstream
legal retrieval tasks, e.g., result ranking and satisfaction prediction, and
demonstrate its effectiveness. Our work provides important insights into the
understanding of user intents in legal case retrieval and potentially leads to
better retrieval techniques in the legal domain, such as intent-aware ranking
strategies and evaluation methodologies.Comment: 28 pages, work in proces
- …