45,763 research outputs found
Streamlined Data Fusion: Unleashing the Power of Linear Combination with Minimal Relevance Judgments
Linear combination is a potent data fusion method in information retrieval
tasks, thanks to its ability to adjust weights for diverse scenarios. However,
achieving optimal weight training has traditionally required manual relevance
judgments on a large percentage of documents, a labor-intensive and expensive
process. In this study, we investigate the feasibility of obtaining
near-optimal weights using a mere 20\%-50\% of relevant documents. Through
experiments on four TREC datasets, we find that weights trained with multiple
linear regression using this reduced set closely rival those obtained with
TREC's official "qrels." Our findings unlock the potential for more efficient
and affordable data fusion, empowering researchers and practitioners to reap
its full benefits with significantly less effort.Comment: 12 pages, 8 figure
A comparison of score, rank and probability-based fusion methods for video shot retrieval
It is now accepted that the most effective video shot retrieval is based on indexing and retrieving clips using multiple, parallel modalities such as text-matching, image-matching and feature matching and then combining or fusing these parallel retrieval streams in some way. In this paper we investigate a range of fusion methods for combining based on multiple visual features (colour, edge and texture), for combining based on multiple visual examples in the query and for combining multiple modalities (text and visual). Using three TRECVid collections and the TRECVid search task, we specifically compare fusion methods based on normalised score and rank that use either the average, weighted average or maximum of retrieval results from a discrete Jelinek-Mercer smoothed language model. We also compare these results with a simple probability-based combination of the language model results that assumes all features and visual examples are fully independent
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View
Multimedia collections are more than ever growing in size and diversity.
Effective multimedia retrieval systems are thus critical to access these
datasets from the end-user perspective and in a scalable way. We are interested
in repositories of image/text multimedia objects and we study multimodal
information fusion techniques in the context of content based multimedia
information retrieval. We focus on graph based methods which have proven to
provide state-of-the-art performances. We particularly examine two of such
methods : cross-media similarities and random walk based scores. From a
theoretical viewpoint, we propose a unifying graph based framework which
encompasses the two aforementioned approaches. Our proposal allows us to
highlight the core features one should consider when using a graph based
technique for the combination of visual and textual information. We compare
cross-media and random walk based results using three different real-world
datasets. From a practical standpoint, our extended empirical analysis allow us
to provide insights and guidelines about the use of graph based methods for
multimodal information fusion in content based multimedia information
retrieval.Comment: An extended version of the paper: Visual and Textual Information
Fusion in Multimedia Retrieval using Semantic Filtering and Graph based
Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM
Transactions on Information System
WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking
We present WISER, a new semantic search engine for expert finding in
academia. Our system is unsupervised and it jointly combines classical language
modeling techniques, based on text evidences, with the Wikipedia Knowledge
Graph, via entity linking.
WISER indexes each academic author through a novel profiling technique which
models her expertise with a small, labeled and weighted graph drawn from
Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the
author's publications, whereas the weighted edges express the semantic
relatedness among these entities computed via textual and graph-based
relatedness functions. Every node is also labeled with a relevance score which
models the pertinence of the corresponding entity to author's expertise, and is
computed by means of a proper random-walk calculation over that graph; and with
a latent vector representation which is learned via entity and other kinds of
structural embeddings derived from Wikipedia.
At query time, experts are retrieved by combining classic document-centric
approaches, which exploit the occurrences of query terms in the author's
documents, with a novel set of profile-centric scoring strategies, which
compute the semantic relatedness between the author's expertise and the query
topic via the above graph-based profiles.
The effectiveness of our system is established over a large-scale
experimental test on a standard dataset for this task. We show that WISER
achieves better performance than all the other competitors, thus proving the
effectiveness of modelling author's profile via our "semantic" graph of
entities. Finally, we comment on the use of WISER for indexing and profiling
the whole research community within the University of Pisa, and its application
to technology transfer in our University
Combining relevance information in a synchronous collaborative information retrieval environment
Traditionally information retrieval (IR) research has focussed on a single user interaction modality, where a user searches to satisfy an information need. Recent
advances in both web technologies, such as the sociable web of Web 2.0, and computer hardware, such as tabletop interface devices, have enabled multiple users to collaborate on many computer-related tasks. Due to these advances there is an increasing need to support
two or more users searching together at the same time, in order to satisfy a shared information need, which we refer to as Synchronous Collaborative Information Retrieval.
Synchronous Collaborative Information Retrieval (SCIR) represents a significant paradigmatic shift from traditional IR systems. In order to support an effective SCIR search, new techniques are required to coordinate users' activities. In this chapter we explore the effectiveness of a sharing of knowledge policy on a collaborating group. Sharing of knowledge refers to the process of passing relevance information across users,
if one user finds items of relevance to the search task then the group should benefit in the form of improved ranked lists returned to each searcher.
In order to evaluate the proposed techniques we simulate two users searching together through an incremental feedback system. The simulation assumes that users decide on an initial query with which to begin the collaborative search and proceed through the search by providing relevance judgments to the system and receiving a new ranked list. In order to populate these simulations we extract data from the interaction logs of various
experimental IR systems from previous Text REtrieval Conference (TREC) workshops
Inexpensive fusion methods for enhancing feature detection
Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere
- …