Search CORE

25 research outputs found

Document structure-driven investigative information retrieval

Author: Ketola T
Roelleke T
Publication venue
Publication date: 01/03/2024
Field of study

Data-driven investigations are increasingly dealing with non-moderated, non-standard and even manipulated information Whether the field in question is journalism, law enforcement, or insurance fraud it is becoming more and more difficult for investigators to verify the outcomes of various black-box systems To contribute to this need of discovery methods that can be used for verification, we introduce a methodology for document structure-driven investigative information retrieval (InvIR) InvIR is defined as a subtask of exploratory IR, where transparency and reasoning take centre stage The aim of InvIR is to facilitate the verification and discovery of facts from data and the communication of those facts to others From a technical perspective, the methodology applies recent work from structured document retrieval (SDR) concerned with formal retrieval constraints and information content-based field weighting (ICFW) Using ICFW, the paper establishes the concept of relevance structures to describe the document structure-based relevance of documents These contexts are then used to help the user navigate during their discovery process and to rank entities of interest The proposed methodology is evaluated using a prototype search system called Relevance Structure-based Entity Ranker (RSER) in order to demonstrate its the feasibility This methodology represents an interesting and important research direction in a world where transparency is becoming more vital than ever

Queen Mary Research Online

A multi-layered Bayesian network model for structured document retrieval

Author: F. Crestani
G. Bordogna
G. Salton
H.R. Turtle
J. Vegas
L.M. Campos de
M. Lalmas
S. Acid
T. Roelleke
Y. Chiaramella
Publication venue
Publication date: 01/01/2003
Field of study

New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

Crossref

University of Strathclyde Institutional Repository

DB&IR Integration: Report on the Dagstuhl Seminar ''Ranked XML Querying''

Author: Amer-Yahia S.
Hiemstra Djoerd
Roelleke T.
Srivastava D.
Weikum G.
Publication venue: Dagstuhl
Publication date: 01/01/2008
Field of study

University of Twente Research Information

Ranking structured documents using utility theory in the Bayesian network retrieval model

Author: F. Crestani
G. Bordogna
G. Kazai
L.M. Campos de
M. Lalmas
R. Baeza-Yates
R.D. Shachter
S. Acid
S. French
T. Roelleke
Y. Chiaramella
Publication venue
Publication date: 01/01/2003
Field of study

In this paper a new method based on Utility and Decision theory is presented to deal with structured documents. The aim of the application of these methodologies is to refine a first ranking of structural units, generated by means of an Information Retrieval Model based on Bayesian Networks. Units are newly arranged in the new ranking by combining their posterior probabilities, obtained in the first stage, with the expected utility of retrieving them. The experimental work has been developed using the Shakespeare structured collection and the results show an improvement of the effectiveness of this new approach

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

A systematic approach to normalization in probabilistic models

Author: Aldo Lipani
Allan Hanbury
Ben He
G Amati
Gerard Salton
K. Church
Mihai Lupu
S Robertson
T Roelleke
Thomas Roelleke
Thomas Roelleke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Open access funding provided by Austrian Science Fund (FWF). This research was partly supported by the Austrian Science Fund (FWF) Project Number P25905-N23 (ADmIRE). This work has been supported by the Self-Optimizer project (FFG 852624) in the EUROSTARS programme, funded by EUREKA, the BMWFW and the European Union

Crossref

UCL Discovery

Queen Mary Research Online

Towards a Better Understanding of the Relationship between Probabilistic Models in IR

Author: C. Zhai
C. Zhai
C. Zhai
C.D. Manning
D.W. Hosmer
F. Crestani
J. Lafferty
J.M. Ponte
K. Spärck-Jones
N. Fuhr
R.W.P. Luk
S.E. Robertson
S.E. Robertson
S.E. Robertson
S.E. Robertson
T. Roelleke
T. Roelleke
V. Lavrenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work

Crossref

Ghent University Academic Bibliography

Flexible and efficient IR using array databases

Author: A. Eisenberg
A. Trotman
Arjen P. de Vries
C. Galindo-Legaria
D.A. Grossman
G. Graefe
G.H. Golub
H. Turtle
I.H. Witten
L.A. Barroso
M.F. Porter
Marcin Zukowski
P. Buneman
Peter Boncz
Roberto Cornacchia
S.E. Robertson
Sándor Héman
T. Grabs
T. Roelleke
U.S. Chakravarthy
V.N. Anh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Opinion-aware retrieval models based on sentiment and intensity of lexical features

Author: Bahrani M
Roelleke T
Publication venue: 'IOS Press'
Publication date: 29/10/2021
Field of study

Sentiment analysis has received much attention in Information Retrieval (IR) and other domains including data mining, machine learning algorithms and NLP. However, when it comes to big data, incorporating sentiment of words into IR models becomes even more important, and as yet no widely accepted standard exists for this task. The contribution of this paper is a framework for quantifying term frequency (TF) variants with sentiments. We propose models derived from the strength of lexical features to improve sentiment-based ranking

Queen Mary Research Online

Explicitly considering relevance within the language modeling framework

Author: Azzopardi L.
Roelleke T.
Publication venue
Publication date: 01/01/2007
Field of study

Whilst the event of relevance is central to the Binary Independence Retrieval model, Language Modeling focuses on the estimation of the document model. In this paper, we review the different past formulations of the Language Modeling (query likelihood) approach. We find that these previous formulations largely ignore relevance by making implicit or explicit assumptions. The main contribution of this work is an alternative formulation that specifically relates relevance and language modeling in a sound probabilistic framework. This leads to valuable insights into the application of Language Modeling to Information Retrieval, including how the approach handles relevance information and how the approach can be further developed

Enlighten

ADOR: A New Medical Dataset for Sentiment-based IR

Author: Bahrani M
Roelleke T
Publication venue
Publication date: 01/01/2021
Field of study

Sentiment analysis has received attention in retrieval applications. Combining opinions such as user feelings with semantics would enhance the performance of these applications, especially when the level of urgency is essential, e.g., medical domain. However, no widely medical benchmark is known for evaluating sentiment-aware IR. In this paper, we create a dataset based on Amazon reviews for medical products and make it publicly available. To assess the compatibility of the benchmark with opinions and concepts we propose a sentiment-aware extension of TF.IDF and apply it to the dataset. This model is derived from linear combinations of sentiment-based TF.IDF score with term-based and conceptual TF.IDF scores. The benchmark could help healthcare organizations to effectively detect, rank and filter the most urgent notifications based on patient's health status, narratives and conditions

Queen Mary Research Online