Search CORE

11,541 research outputs found

Aggregated Topic Models for Increasing Social Media Topic Coherence

Author: Bi Y
Blair Stuart
Mulvenna Maurice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/01/2020
Field of study

Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability

Author: Manolopoulou Ioanna
Musolesi Mirco
O'sullivan Jason
Prior Rosie
Vega-Carrasco Mariflor
Publication venue
Publication date: 04/05/2020
Field of study

Understanding the shopping motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain.Comment: 20 pages, 9 figure

arXiv.org e-Print Archive

UCL Discovery

DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

Author: Bougouin Adrien
Frank Eibe
Glorot Xavier
Ioffe Sergey
Kim Su Nam
Kim Youngsam
Kingma Diederik
Mihalcea Rada
Mikolov Tomáš
Qazvinian Vahed
Wan Xiaojun
Wang Yining
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/05/2019
Field of study

Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches.Comment: Accepted to SIGIR 201

arXiv.org e-Print Archive

Crossref

Search strategies of Wikipedia readers

Author: Francesca Tria
Giovanna Chiara Rodi
J Gwizdka
JN Giedd
K Foerde
K Suchecki
MA Just
P Singer
TA Schweizer
Tobias Preis
TP Novikoff
Vittorio Loreto
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

The quest for information is one of the most common activity of human beings. Despite the the impressive progress of search engines, not to miss the needed piece of information could be still very tough, as well as to acquire specific competences and knowledge by shaping and following the proper learning paths. Indeed, the need to find sensible paths in information networks is one of the biggest challenges of our societies and, to effectively address it, it is important to investigate the strategies adopted by human users to cope with the cognitive bottleneck of finding their way in a growing sea of information. Here we focus on the case of Wikipedia and investigate a recently released dataset about users’ click on the English Wikipedia, namely the English Wikipedia Clickstream. We perform a semantically charged analysis to uncover the general patterns followed by information seekers in the multi-dimensional space of Wikipedia topics/categories. We discover the existence of well defined strategies in which users tend to start from very general, i.e., semantically broad, pages and progressively narrow down the scope of their navigation, while keeping a growing semantic coherence. This is unlike strategies associated to tasks with predefined search goals, namely the case of the Wikispeedia game. In this case users first move from the ‘particular’ to the ‘universal’ before focusing down again to the required target. The clear picture offered here represents a very important stepping stone towards a better design of information networks and recommendation strategies, as well as the construction of radically new learning paths

Crossref

Directory of Open Access Journals

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

PORTO Publications Open Repository TOrino

FigShare

Quantifying Priorities in Business Cycle Reports: Analysis of Recurring Textual Patterns around Peaks and Troughs

Author: Foltas Alexander
Publication venue: Humboldt-Universität zu Berlin
Publication date: 01/07/2023
Field of study

I propose a novel approach to uncover business cycle reports’ priorities and relate them to economic fluctuations. To this end, I leverage quantitative business-cycle forecasts published by leading German economic research institutes since 1970 to estimate the proportions of latent topics in associated business cycle reports. I then employ a supervised approach to aggregate topics with similar themes, thus revealing the proportions of broader macroeconomic subjects. I obtain measures of forecasters’ subject priorities by extracting the subject proportions’ cyclic components. Correlating these priorities with key macroeconomic variables reveals consistent priority patterns throughout economic peaks and troughs. The forecasters prioritize inflation-related matters over recession-related considerations around peaks. This finding suggests that forecasters underestimate growth and overestimate inflation risks during contractive monetary policies, which might explain their failure to predict recessions. Around troughs, forecasters prioritize investment matters, potentially suggesting a better understanding of macroeconomic developments during those periods compared to peaks

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin