Search CORE

15 research outputs found

Improving Neural Ranking Models with Traditional IR Methods

Author: Gittens Alex
Hassanzadeh Oktie
Ni Jian
Saha Anik
Srinivas Kavitha
Yener Bulent
Publication venue
Publication date: 29/08/2023
Field of study

Neural ranking methods based on large transformer models have recently gained significant attention in the information retrieval community, and have been adopted by major commercial solutions. Nevertheless, they are computationally expensive to create, and require a great deal of labeled data for specialized corpora. In this paper, we explore a low resource alternative which is a bag-of-embedding model for document retrieval and find that it is competitive with large transformer models fine tuned on information retrieval tasks. Our results show that a simple combination of TF-IDF, a traditional keyword matching method, with a shallow embedding model provides a low cost path to compete well with the performance of complex neural ranking models on 3 datasets. Furthermore, adding TF-IDF measures improves the performance of large-scale fine tuned models on these tasks.Comment: Short paper, 4 page

arXiv.org e-Print Archive

Dense vs. Sparse representations for news stream clustering

Author: Barron-Cedeno A.
Da San Martino G.
Nakov P.
Staykovski T.
Publication venue: CEUR-WS
Publication date: 01/01/2019
Field of study

The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF\u2013IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Qlusty: Quick and dirty generation of event videos from written media coverage

Author: Ahmed Ali
Barron-Cedeno A.
Da San Martino G.
Dalvi F.
Zhang Yifan
Publication venue: CEUR-WS
Publication date: 01/01/2018
Field of study

Qlusty generates videos describing the coverage of the same event by different news outlets automatically. Throughout four modules it identifies events, de-duplicates notes, ranks according to coverage, and queries for images to generate an overview video. In this manuscript we present our preliminary models, including quantitative evaluations of the former two and a qualitative analysis of the latter two. The results show the potential for achieving our main aim: contributing in breaking the information bubble, so common in the current news landscape

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna