6 research outputs found

    ADIOS LDA: When Grammar Induction Meets Topic Modeling

    Get PDF
    We explore the interplay between grammar induction and topic modeling approaches to unsupervised text processing. These two methods complement each other since one allows for the identification of local structures centered around certain key terms, while the other generates a document wide context of expressed topics. This approach allows us to access and identify semantic structures that would be otherwise hardly discovered by using only one of the two aforementioned methods. Using our approach, we are able to provide a deeper understanding of the topic structure by examining inferred information structures characteristic of given topics as well as capture differences in word usage that would be hard by using standard disambiguation methods. We perform our exploration on an extensive corpus of blog posts centered around the surveillance discussion, where we focus on the debate around the Snowden affair. We show how our approach can be used for (semi-) automated content classification and the extraction of semantic features from large textual corpora

    Can Large Language Models Support Editors Pick Related News Articles?

    Get PDF
    Editors and journalists play an important role on news platforms. Besides creating trustworthy news stories, they also provide valuable expertise on which stories are placed on the front page and hand-pick related articles for platform users to read further. This paper focuses on the specific task of related article selection commonly carried out daily by editors and journalists on news platforms. This is typically a manual process that utilizes an internal search tool to first find a pool of potential candidate articles. Then, from those candidate articles, editors and journalists hand-pick the top related articles for a given news article as a form of expert-selected suggestions for the readers. Although this task can be an important part of the editorial process in news platforms, it may become time-consuming and demanding, often requiring significant human effort. In addressing this challenge, we propose an automatic mechanism to support editors and journalists in this task by incorporating one of the latest Large Language Models (LLMs), i.e., GPT4o-mini, to shortlist a set of related articles and recommend them to be checked by journalists and editors. Our evaluation of the proposed approach, based on a real-world dataset from one of the largest commercial Norwegian news platforms (i.e., TV 2), demonstrates the effectiveness of the approach in supporting editors and journalists in their task of selecting relevant news articles.publishedVersio

    EDEN: A Dataset for Event Detection in Norwegian News

    Get PDF
    We present EDEN, the first Norwegian dataset annotated with event information at the sentence level, adapting the widely used ACE event schema to Norwegian. The paper describes the manual annotation of Norwegian text as well as transcribed speech in the news domain, together with inter-annotator agreement and discussions of relevant dataset statistics. We also present preliminary modeling results using a graph-based event parser. The resulting dataset will be made freely available for download and use.publishedVersio

    Topically-focused Blog Corpora for Multiple Languages

    No full text
    corecore