7 research outputs found

    ADIOS LDA: When Grammar Induction Meets Topic Modeling

    Get PDF
    We explore the interplay between grammar induction and topic modeling approaches to unsupervised text processing. These two methods complement each other since one allows for the identification of local structures centered around certain key terms, while the other generates a document wide context of expressed topics. This approach allows us to access and identify semantic structures that would be otherwise hardly discovered by using only one of the two aforementioned methods. Using our approach, we are able to provide a deeper understanding of the topic structure by examining inferred information structures characteristic of given topics as well as capture differences in word usage that would be hard by using standard disambiguation methods. We perform our exploration on an extensive corpus of blog posts centered around the surveillance discussion, where we focus on the debate around the Snowden affair. We show how our approach can be used for (semi-) automated content classification and the extraction of semantic features from large textual corpora

    Can Large Language Models Support Editors Pick Related News Articles?

    Get PDF
    Editors and journalists play an important role on news platforms. Besides creating trustworthy news stories, they also provide valuable expertise on which stories are placed on the front page and hand-pick related articles for platform users to read further. This paper focuses on the specific task of related article selection commonly carried out daily by editors and journalists on news platforms. This is typically a manual process that utilizes an internal search tool to first find a pool of potential candidate articles. Then, from those candidate articles, editors and journalists hand-pick the top related articles for a given news article as a form of expert-selected suggestions for the readers. Although this task can be an important part of the editorial process in news platforms, it may become time-consuming and demanding, often requiring significant human effort. In addressing this challenge, we propose an automatic mechanism to support editors and journalists in this task by incorporating one of the latest Large Language Models (LLMs), i.e., GPT4o-mini, to shortlist a set of related articles and recommend them to be checked by journalists and editors. Our evaluation of the proposed approach, based on a real-world dataset from one of the largest commercial Norwegian news platforms (i.e., TV 2), demonstrates the effectiveness of the approach in supporting editors and journalists in their task of selecting relevant news articles.publishedVersio

    EDEN: A Dataset for Event Detection in Norwegian News

    Get PDF
    We present EDEN, the first Norwegian dataset annotated with event information at the sentence level, adapting the widely used ACE event schema to Norwegian. The paper describes the manual annotation of Norwegian text as well as transcribed speech in the news domain, together with inter-annotator agreement and discussions of relevant dataset statistics. We also present preliminary modeling results using a graph-based event parser. The resulting dataset will be made freely available for download and use.publishedVersio

    Bloggers’ Responses to the Snowden Affair: Combining Automated and Manual Methods in the Analysis of News Blogging

    No full text
    The Snowden affair gave rise to a huge public debate about not only the legitimacy of the secret surveillance programs he revealed but also about Snowden himself and about the accuracy of the information he leaked. In this paper we present an analysis of how the affair was discussed in the English language blogosphere, based on a corpus of 15,000 blog posts written about Snowden and published from June 2013 to June 2014, as a sub-corpus of a larger corpus of 100,000 blog posts on the topic of surveillance, written during the period 2006–2014. Automated tools are used to identify the topics that characterize the blogging about surveillance and the posts about the Snowden affair. Through an in-depth analysis of the blog posts that commented on Snowden’s revelations of the PRISM program for surveillance of social media users, we chart how bloggers responded to Snowden and his role in this disclosure, whether they found the information credible, and the extent to which they expressed criticism of the surveillance practices. The analysis is used as a basis for discussing the role of blogs in the civic engagement during the first phase of the Snowden affair

    Topically-focused Blog Corpora for Multiple Languages

    No full text
    corecore