Search CORE

28 research outputs found

Learning to merge search results for efficient Distributed Information Retrieval

Author: Hiemstra Djoerd
Tjin-Kam-Jet Kien-Tsoi T.E.
Publication venue: Radboud University
Publication date: 01/01/2010
Field of study

Merging search results from different servers is a major problem in Distributed Information Retrieval. We used Regression-SVM and Ranking-SVM which would learn a function that merges results based on information that is readily available: i.e. the ranks, titles, summaries and URLs contained in the results pages. By not downloading additional information, such as the full document, we decrease bandwidth usage. CORI and Round Robin merging were used as our baselines; surprisingly, our results show that the SVM-methods do not improve over those baselines

CiteSeerX

Radboud Repository

University of Twente Research Information

Search of spoken documents retrieves well recognized transcripts

Author: Sanderson M.
Shou X.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2007
Field of study

This paper presents a series of analyses and experiments on spoken document retrieval systems: search engines that retrieve transcripts produced by speech recognizers. Results show that transcripts that match queries well tend to be recognized more accurately than transcripts that match a query less well. This result was described in past literature, however, no study or explanation of the effect has been provided until now. This paper provides such an analysis showing a relationship between word error rate and query length. The paper expands on past research by increasing the number of recognitions systems that are tested as well as showing the effect in an operational speech retrieval system. Potential future lines of enquiry are also described

White Rose Research Online

Examining repetition in user search behavior

Author: Dumais S.
Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This paper describes analyses of the repeated use of search engines. It is shown that users commonly re-issue queries, either to examine search results deeply or simply to query again, often days or weeks later. Hourly and weekly periodicities in behavior are observed for both queries and clicks. Navigational queries were found to be repeated differently from others

White Rose Research Online

LESIM: A Novel Lexical Similarity Measure Technique for Multimedia Information Retrieval

Author: Avlonitis Markos
Kanavos Andreas
Karacapilidis Nikos
Karydis Ioannis
Sioutas Spyros
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2018
Field of study

Metadata-based similarity measurement is far from obsolete in our days, despite research’s focus on content and context. It allows for aggregating information from textual references, measuring similarity when content is not available, traditional keyword search in search engines, merging results in meta-search engines and many more research and industry interesting activities. Existing similarity measures do not take into consideration neither the unique nature of multimedia’s metadata nor the requirements of metadata-based information retrieval of multimedia. This work proposes a customised for the commonly available author-title multimedia metadata hybrid similarity measure that is shown through experimentation to be significantly more effective than baseline measures

AIS Electronic Library (AISeL)

A Comparative Analysis of Retrievability and PageRank Measures

Author: Mall Priyanshu Raj
Roy Dwaipayan
Sinha Aman
Publication venue
Publication date: 17/11/2023
Field of study

The accessibility of documents within a collection holds a pivotal role in Information Retrieval, signifying the ease of locating specific content in a collection of documents. This accessibility can be achieved via two distinct avenues. The first is through some retrieval model using a keyword or other feature-based search, and the other is where a document can be navigated using links associated with them, if available. Metrics such as PageRank, Hub, and Authority illuminate the pathways through which documents can be discovered within the network of content while the concept of Retrievability is used to quantify the ease with which a document can be found by a retrieval model. In this paper, we compare these two perspectives, PageRank and retrievability, as they quantify the importance and discoverability of content in a corpus. Through empirical experimentation on benchmark datasets, we demonstrate a subtle similarity between retrievability and PageRank particularly distinguishable for larger datasets.Comment: Accepted at FIRE 202

arXiv.org e-Print Archive

Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling

Author: Albro Stephen
Batmanghelich Kayhan
DeSalvo Giulia
Kothawade Suraj
Rashwan Abdullah
Tavakkol Sasan
Yin Xiaoqi
Yu Ke
Publication venue
Publication date: 27/09/2023
Field of study

Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold.Comment: UNCV ICCV 202

arXiv.org e-Print Archive

Learning to Choose : automatic Selection of the Information Retrieval Parameters

Author: Bigot Anthony
Dejean Sébastien
Mothe Josiane
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIn this paper we promote a selective information retrieval process to be applied in the context of repeated queries. The method is based on a training phase in which the meta search system learns the best parameters to use on a per query basis. The training phase uses a sample of annotated documents for which document relevance is known. When an equal-query is submitted to the system, it automatically knows which parameters it should use to treat the query. This Learning to choose method is evaluated using simulated data from TREC campaigns. We show that system performance highly increases in terms of precision (MAP), speci cally for the queries that are di cult to answer, when compared to any unique system con guration applied to all the queries

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse