Information retrieval from scientific abstract and citation databases: A query-by-documents approach based on Monte-Carlo sampling

Espuña Camarasa, Antonio; Farreres de la Morena, Xavier; Galvan Cara, Aldwin Lois; Graells Sobré, Moisès; Lechtenberg, Fabian; Somoza Tornos, Ana

Information retrieval from scientific abstract and citation databases: A query-by-documents approach based on Monte-Carlo sampling

Authors: Antonio Espuña Camarasa
Xavier Farreres de la Morena
Aldwin Lois Galvan Cara
Moisès Graells Sobré
Fabian Lechtenberg
Ana Somoza Tornos
Publication date: 1 August 2022
Publisher: 'Elsevier BV'
Doi

Abstract

The rapidly increasing amount of information and entries in abstract and citation databases steadily complicates the information retrieval task. In this study, a novel query-by-document approach using Monte-Carlo sampling of relevant keywords is presented. From a set of input documents (seed) keywords are extracted using TF-IDF and subsequently sampled to repeatedly construct queries to the database. The occurrence of returned documents is counted and serves as a proxy relevance metric. Two case studies based on the Scopus® database are used to demonstrate the method and its key advantages. No expert knowledge and human intervention is needed to construct the final search strings which reduces the human bias. The methods practicality is supported by the high re-retrieval of seed documents of 7/8 and 26/31 in high ranks in the two presented case studies.Peer ReviewedPostprint (author's final draft

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/370...

Last time updated on 07/10/2022