Search CORE

9 research outputs found

Discriminative Topic Modeling with Logistic LDA

Author: Fedoryszak Mateusz
Korshunova Iryna
Theis Lucas
Xiong Hanchen
Publication venue
Publication date: 01/01/2019
Field of study

Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, and integrates well with deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Real-time Event Detection on Social Data Streams

Author: Fedoryszak Mateusz
Frederick Brent
Rajaram Vijay
Zhong Changtao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/07/2019
Field of study

Social networks are quickly becoming the primary medium for discussing what is happening around real-world events. The information that is generated on social platforms like Twitter can produce rich data streams for immediate insights into ongoing matters and the conversations around them. To tackle the problem of event detection, we model events as a list of clusters of trending entities over time. We describe a real-time system for discovering events that is modular in design and novel in scale and speed: it applies clustering on a large stream with millions of entities per minute and produces a dynamically updated set of events. In order to assess clustering methodologies, we build an evaluation dataset derived from a snapshot of the full Twitter Firehose and propose novel metrics for measuring clustering quality. Through experiments and system profiling, we highlight key results from the offline and online pipelines. Finally, we visualize a high profile event on Twitter to show the importance of modeling the evolution of events, especially those detected from social data streams.Comment: Accepted as a full paper at KDD 2019 on April 29, 201

arXiv.org e-Print Archive

Crossref

CERMINE: automatic extraction of structured metadata from scientific literature

Author: A McCallum
C Chang
CH Lee
Dominika Tkaczyk
J Zou
L O’Gorman
LA Goodman
M Luong
Mateusz Fedoryszak
Paweł Szostek
Piotr Jan Dendek
R Kern
T Smith
X Zhang
Łukasz Bolikowski
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Discriminative topic modeling with logistic LDA

Author: Fedoryszak Mateusz
Korshunova Iryna
Theis Lucas
Xiong Hanchen
Publication venue
Publication date: 01/01/2019
Field of study

Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging for practitioners. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, or integrate deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA

Ghent University Academic Bibliography

CeON/CERMINE: CERMINE 1.13

Author: Aleksander Nowiński
Artur Czeczko
Bartosz Tarnawski
Daniel
Dominika Tkaczyk
Gerald H
Joshua French
Krzysztof Mądry
Martin Körner
Mateusz Fedoryszak
Mateusz Kobos
Mateusz Neumann
Pawel Szostek
Piotr Dendek
The Gitter Badger
Łukasz Bolikowski
Łukasz Pawełczak
Publication venue
Publication date
Field of study

Content ExtRactor and MINE

ZENODO