15 research outputs found
Improving Neural Ranking Models with Traditional IR Methods
Neural ranking methods based on large transformer models have recently gained
significant attention in the information retrieval community, and have been
adopted by major commercial solutions. Nevertheless, they are computationally
expensive to create, and require a great deal of labeled data for specialized
corpora. In this paper, we explore a low resource alternative which is a
bag-of-embedding model for document retrieval and find that it is competitive
with large transformer models fine tuned on information retrieval tasks. Our
results show that a simple combination of TF-IDF, a traditional keyword
matching method, with a shallow embedding model provides a low cost path to
compete well with the performance of complex neural ranking models on 3
datasets. Furthermore, adding TF-IDF measures improves the performance of
large-scale fine tuned models on these tasks.Comment: Short paper, 4 page
Dense vs. Sparse representations for news stream clustering
The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF\u2013IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task
Qlusty: Quick and dirty generation of event videos from written media coverage
Qlusty generates videos describing the coverage of the same event by different news outlets automatically. Throughout four modules it identifies events, de-duplicates notes, ranks according to coverage, and queries for images to generate an overview video. In this manuscript we present our preliminary models, including quantitative evaluations of the former two and a qualitative analysis of the latter two. The results show the potential for achieving our main aim: contributing in breaking the information bubble, so common in the current news landscape