Search CORE

982 research outputs found

Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data

Author: A Fang
A Guolo
DM Blei
L AlSumait
M Braun
TL Griffiths
WX Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

There is considerable interest among both researchers and the mass public in understanding the topics of discussion on social media as they occur over time. Scholars have thoroughly analysed sampling-based topic modelling approaches for various text corpora including social media; however, another LDA topic modelling implementation—Variational Bayesian (VB)—has not been well studied, despite its known efficiency and its adaptability to the volume and dynamics of social media data. In this paper, we examine the performance of the VB-based topic modelling approach for producing coherent topics, and further, we extend the VB approach by proposing a novel time-sensitive Variational Bayesian implementation, denoted as TVB. Our newly proposed TVB approach incorporates time so as to increase the quality of the generated topics. Using a Twitter dataset covering 8 events, our empirical results show that the coherence of the topics in our TVB model is improved by the integration of time. In particular, through a user study, we find that our TVB approach generates less mixed topics than state-of-the-art topic modelling approaches. Moreover, our proposed TVB approach can more accurately estimate topical trends, making it particularly suitable to assist end-users in tracking emerging topics on social media

Crossref

Enlighten

A framework for evaluating automatic image annotation algorithms

Author: A.W.M. Smeulders
B.S. Manjunath
D.G. Lowe
D.M. Blei
D.M. Blei
G. Carneiro
H. Kwasnicka
J. Jeon
J. Li
L. Fei-Fei
N. Vasconcelos
P. Duygulu
V. Lavrenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Several Automatic Image Annotation (AIA) algorithms have been introduced recently, which have been found to outperform previous models. However, each one of them has been evaluated using either different descriptors, collections or parts of collections, or "easy" settings. This fact renders their results non-comparable, while we show that collection-specific properties are responsible for the high reported performance measures, and not the actual models. In this paper we introduce a framework for the evaluation of image annotation models, which we use to evaluate two state-of-the-art AIA algorithms. Our findings reveal that a simple Support Vector Machine (SVM) approach using Global MPEG-7 Features outperforms state-of-the-art AIA models across several collection settings. It seems that these models heavily depend on the set of features and the data used, while it is easy to exploit collection-specific properties, such as tag popularity especially in the commonly used Corel 5K dataset and still achieve good performance

CiteSeerX

Crossref

Enlighten

Identifying Editor Roles in Argumentative Writing from Student Revision Histories

Author: DM Blei
L Faigley
RD Roscoe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/09/2019
Field of study

We present a method for identifying editor roles from students' revision behaviors during argumentative writing. We first develop a method for applying a topic modeling algorithm to identify a set of editor roles from a vocabulary capturing three aspects of student revision behaviors: operation, purpose, and position. We validate the identified roles by showing that modeling the editor roles that students take when revising a paper not only accounts for the variance in revision purposes in our data, but also relates to writing improvement

arXiv.org e-Print Archive

Crossref

Optimal client recommendation for market makers in illiquid financial products

Author: DD Lee
DJC MacKay
DM Blei
DM Blei
EJ Elton
F Pedregosa
G Shani
GE Batista
I Kim
KS Jones
L Bolelli
M Avellaneda
M Hoffman
MI Jordan
S Robertson
Y Amihud
Publication venue
Publication date: 27/04/2017
Field of study

The process of liquidity provision in financial markets can result in prolonged exposure to illiquid instruments for market makers. In this case, where a proprietary position is not desired, pro-actively targeting the right client who is likely to be interested can be an effective means to offset this position, rather than relying on commensurate interest arising through natural demand. In this paper, we consider the inference of a client profile for the purpose of corporate bond recommendation, based on typical recorded information available to the market maker. Given a historical record of corporate bond transactions and bond meta-data, we use a topic-modelling analogy to develop a probabilistic technique for compiling a curated list of client recommendations for a particular bond that needs to be traded, ranked by probability of interest. We show that a model based on Latent Dirichlet Allocation offers promising performance to deliver relevant recommendations for sales traders.Comment: 12 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Exploratory topic modeling with distributional semantics

Author: A Treisman
DA Keim
DM Blei
J Risch
L Barth
M Bostock
S Fortunato
S Lohmann
S Palmer
Y Bengio
Publication venue
Publication date: 16/07/2015
Field of study

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015

arXiv.org e-Print Archive

Crossref

What2Cite: Unveiling Topics and Citations Dependencies for Scientific Literature Exploration and Recommendation

Author: DM Blei
DM Blei
L Di Caro
M Schuster
N Nagwani
NJ van Eck
NJ van Eck
SPD Shotton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

Institutional Research Information System University of Turin

Selective Metal Cation Capture by Soft Anionic Metal-Organic Frameworks via Drastic Single-Crystal-to-Single-Crystal Transformations

Author: B Settles
CW Arnold
DM Blei
DM Blei
DR Rhodes
DR Swanson
GA Miller
JH Miles
L Kanner
SC Deerwester
Publication venue
Publication date: 13/06/2012
Field of study

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) - an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we will make freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.Comment: In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 201

arXiv.org e-Print Archive

Heriot Watt Pure

Deakin Research Online

Crossref

Edinburgh Research Explorer

University of St. Andrews - Pure

Temporal Cross-Media Retrieval with Soft-Smoothing

Author: Andrew Galen
Benevenuto Fabricio
Blei David M.
He Kaiming
Herbrich Ralf
Hu D.
Ngiam Jiquan
Srivastava Nitish
Srivastava Nitish
Uricchio Tiberio
Wang L.
Yan F.
Zhan M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/10/2018
Field of study

Multimedia information have strong temporal correlations that shape the way modalities co-occur over time. In this paper we study the dynamic nature of multimedia and social-media information, where the temporal dimension emerges as a strong source of evidence for learning the temporal correlations across visual and textual modalities. So far, cross-media retrieval models, explored the correlations between different modalities (e.g. text and image) to learn a common subspace, in which semantically similar instances lie in the same neighbourhood. Building on such knowledge, we propose a novel temporal cross-media neural architecture, that departs from standard cross-media methods, by explicitly accounting for the temporal dimension through temporal subspace learning. The model is softly-constrained with temporal and inter-modality constraints that guide the new subspace learning task by favouring temporal correlations between semantically similar and temporally close instances. Experiments on three distinct datasets show that accounting for time turns out to be important for cross-media retrieval. Namely, the proposed method outperforms a set of baselines on the task of temporal cross-media retrieval, demonstrating its effectiveness for performing temporal subspace learning.Comment: To appear in ACM MM 201

arXiv.org e-Print Archive

Crossref

Location Dependent Dirichlet Processes

Author: A Oliva
A Rodríguez
C Bishop
C Williams
CE Rasmussen
D Blei
D Blei
D Dunson
E Sudderth
F Zhu
H Ishwaran
J Duan
J Griffin
J Griffin
J Paisley
J Sethuraman
J Shi
L Ren
N Foti
P Orbanz
R Unnikrishnan
S Kumar
T Ferguson
X Sun
YW Teh
Publication venue
Publication date: 02/07/2017
Field of study

Dirichlet processes (DP) are widely applied in Bayesian nonparametric modeling. However, in their basic form they do not directly integrate dependency information among data arising from space and time. In this paper, we propose location dependent Dirichlet processes (LDDP) which incorporate nonparametric Gaussian processes in the DP modeling framework to model such dependencies. We develop the LDDP in the context of mixture modeling, and develop a mean field variational inference algorithm for this mixture model. The effectiveness of the proposed modeling framework is shown on an image segmentation task

arXiv.org e-Print Archive

Crossref