Search CORE

31 research outputs found

Preliminary Experiments using Subjective Logic for the Polyrepresentation of Information Needs

Author: Ingwersen Peter
Larsen Birger
Lioma Christina
Publication venue
Publication date: 05/04/2017
Field of study

According to the principle of polyrepresentation, retrieval accuracy may improve through the combination of multiple and diverse information object representations about e.g. the context of the user, the information sought, or the retrieval system. Recently, the principle of polyrepresentation was mathematically expressed using subjective logic, where the potential suitability of each representation for improving retrieval performance was formalised through degrees of belief and uncertainty. No experimental evidence or practical application has so far validated this model. We extend the work of Lioma et al. (2010), by providing a practical application and analysis of the model. We show how to map the abstract notions of belief and uncertainty to real-life evidence drawn from a retrieval dataset. We also show how to estimate two different types of polyrepresentation assuming either (a) independence or (b) dependence between the information objects that are combined. We focus on the polyrepresentation of different types of context relating to user information needs (i.e. work task, user background knowledge, ideal answer) and show that the subjective logic model can predict their optimal combination prior and independently to the retrieval process

arXiv.org e-Print Archive

CiteSeerX

Optimal Information Retrieval with Complex Utility Functions

Author: Tao Tao
Zhai ChengXiang
Publication venue
Publication date: 01/04/2004
Field of study

Existing retrieval models all attempt to optimize one single utility function, which is often based on the topical relevance of a document with respect to a query. In real applications, retrieval involves more complex utility functions that may involve preferences on several different dimensions. In this paper, we present a general optimization framework for retrieval with complex utility functions. A query language is designed according to this framework to enable users to submit complex queries. We propose an efficient algorithm for retrieval with complex utility functions based on the a-priori algorithm. As a case study, we apply our algorithm to a complex utility retrieval problem in distributed IR. Experiment results show that our algorithm allows for flexible tradeoff between multiple retrieval criteria. Finally, we study the efficiency issue of our algorithm on simulated data

Illinois Digital Environment for Access to Learning and Scholarship Repository

Evaluation of a Bayesian inference network for ligand-based virtual screening

Author: A Abdo
A Bender
AG Maldonado
AN Jain
AR Leach
AR Leach
Beining Chen
Christoph Mueller
CX Zhai
D Metzler
EJ Gardiner
EM Voorhees
G Salton
GW Bemis
H Eckert
H Turtle
J Bajorath
J Hert
J Hert
J-F Truchon
JA Grant
JD Holliday
JP Callan
JP Callan
JR Fischer
K Spärck Jones
K Spärck Jones
N Nikolova
P Prathipati
P Willett
P Willett
P Willett
P Willett
P Willett
Peter Willett
RC Glen
RD Brown
RP Sheridan
RP Sheridan
S Siegel
SJ Edgar
T Lengauer
T Strohman
TI Oprea
WR Greiff
X Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background Bayesian inference networks enable the computation of the probability that an event will occur. They have been used previously to rank textual documents in order of decreasing relevance to a user-defined query. Here, we modify the approach to enable a Bayesian inference network to be used for chemical similarity searching, where a database is ranked in order of decreasing probability of bioactivity. Results Bayesian inference networks were implemented using two different types of network and four different types of belief function. Experiments with the MDDR and WOMBAT databases show that a Bayesian inference network can be used to provide effective ligand-based screening, especially when the active molecules being sought have a high degree of structural homogeneity; in such cases, the network substantially out-performs a conventional, Tanimoto-based similarity searching system. However, the effectiveness of the network is much less when structurally heterogeneous sets of actives are being sought. Conclusion A Bayesian inference network provides an interesting alternative to existing tools for ligand-based virtual screening

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

White Rose Research Online

Rhetorical relations for information retrieval

Author: Larsen Birger
Lioma Christina
Lu Wei
Publication venue
Publication date: 05/04/2017
Field of study

Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this socalled discourse structure has been applied successfully to several natural language processing tasks. This work studies the use of rhetorical relations for Information Retrieval (IR): Is there a correlation between certain rhetorical relations and retrieval performance? Can knowledge about a document's rhetorical relations be useful to IR? We present a language model modification that considers rhetorical relations when estimating the relevance of a document to a query. Empirical evaluation of different versions of our model on TREC settings shows that certain rhetorical relations can benefit retrieval effectiveness notably (> 10% in mean average precision over a state-of-the-art baseline)

arXiv.org e-Print Archive

CiteSeerX

Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

Author: Kraaij Wessel
Nie Jian-Yun
Simard Michel
Publication venue
Publication date: 01/01/2003
Field of study

Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

arXiv.org e-Print Archive

CiteSeerX

Leiden University Scholary Publications

Leveraging Semantic Annotations to Link Wikipedia and News Archives

Author: Berberich K.
Mishra A.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2016
Field of study

The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best

MPG.PuRe

Optimizing Two-Stage Bigram Language Models for IR

Author: Jianfeng Gao
Kuansan Wang
Sara Javanmardi
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT Although higher order language models (LMs) have shown benefit of capturing word dependencies for Information retrieval (IR), the tuning of the increased number of free parameters remains a formidable engineering challenge. Consequently, in many real-world retrieval systems, applying higher order LMs is an exception rather than the rule. In this study, we address the parameter tuning problem using a framework based on a linear ranking model in which different component models are incorporated as features. Using unigram and bigram LMs with 2-stage smoothing as examples, we show that our method leads to a bigram LM that outperforms significantly its unigram counterpart and the well-tuned BM25 model

CiteSeerX