What company does my news article refer to? Tackling multiclass problems with topic modeling

Farrell, Patricio; Kunkel, Julian; Lübbering, Max

What company does my news article refer to? Tackling multiclass problems with topic modeling

Authors: Patricio Farrell
Julian Kunkel
Max Lübbering
Publication date: 1 January 2019
Publisher
Doi

Abstract

While it is technically trivial to search for the company name to predict the company a new article refers to, it often leads to incorrect results. In this article, we compare the two approaches bag-of-words with k-nearest neighbors and Latent Dirichlet Allocation with k-nearest neighbor by assessing their applicability for predicting the S\&P 500 company which is mentioned in a business news article or press release. Both approaches are evaluated on a corpus of 13k documents containing 84\% news articles and 16\% press releases. While the bag-of-words approach yields accurate predictions, it is highly inefficient due to its gigantic feature space. The Latent Dirichlet Allocation approach, on the other hand, manages to achieve roughly the same prediction accuracy (0.58 instead of 0.62) but reduces the feature space by a factor of seven

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Repositorium für Naturwissenschaften und Technik

oai:oa.tib.eu:123456789/9205

Last time updated on 23/07/2022

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

oai:archive.wias-berlin.de:wia...

Last time updated on 04/04/2020