22 research outputs found
Steering time-dependent estimation of posteriors with hyperparameter indexing in Bayesian topic models
This paper provides a new approach to topical trend analysis. Our aim is to improve the generalization power of latent Dirichlet allocation (LDA) by using document timestamps. Many previous works model topical trends by making latent topic distributions time-dependent. We propose a straightforward approach by preparing a different word multinomial distribution for each time point. Since this approach increases the number of parameters, overfitting becomes a critical issue. Our contribution to this issue is two-fold. First, we propose an effective way of defining Dirichlet priors over the word multinomials. Second, we propose a special scheduling of variational Bayesian (VB) inference. Comprehensive experiments with six datasets prove that our approach can improve LDA and also Topics over Time, a well-known variant of LDA, in terms of test data perplexity in the framework of VB inference
Nonparametric Bayesian Topic Modelling with Auxiliary Data
The intent of this dissertation in computer science is to study
topic models for text analytics. The first objective of this
dissertation is to incorporate auxiliary information present in
text corpora to improve topic modelling for natural language
processing (NLP) applications. The second objective of this
dissertation is to extend existing topic models to employ
state-of-the-art nonparametric Bayesian techniques for better
modelling of text data. In particular, this dissertation focusses
on:
- incorporating hashtags, mentions, emoticons, and target-opinion
dependency present in tweets, together with an external sentiment
lexicon, to perform opinion mining or sentiment analysis on
products and services;
- leveraging abstracts, titles, authors, keywords, categorical
labels, and the citation network to perform bibliographic
analysis on research publications, using a supervised or
semi-supervised topic model; and
- employing the hierarchical Pitman-Yor process (HPYP) and the
Gaussian process (GP) to jointly model text, hashtags, authors,
and the follower network in tweets for corpora exploration and
summarisation.
In addition, we provide a framework for implementing arbitrary
HPYP topic models to ease the development of our proposed topic
models, made possible by modularising the Pitman-Yor processes.
Through extensive experiments and qualitative assessment, we find
that topic models fit better to the data as we utilise more
auxiliary information and by employing the Bayesian nonparametric
method
Trend Analysis in AI Research over time Using NLP Techniques
The dramatic rise in the number of publications in machine learning related studies poses a challenge for companies and new researchers when they want to focus their resources effectively. This thesis aims to provide an automatic pipeline to extract the most relevant trends in the machine learning field. I applied unsupervised topic modeling methods to discover research trends from full NIPS conference papers from 1987 to 2018. By comparing the Latent Dirichlet Allocation (LDA) topic model with a model utilizing semantic word vectors (sHDP), it was shown that the LDA performed better in both quality and coherence. Using the LDA, 50 topics were extracted and interpreted to match the key concepts in the conference publications. The results revealed three distinct eras in the NIPS history as well as the steady shift away from the neural information processing roots towards deep learning
Recommended from our members
Composing Deep Learning and Bayesian Nonparametric Methods
Recent progress in Bayesian methods largely focus on non-conjugate models featured with extensive use of black-box functions: continuous functions implemented with neural networks. Using deep neural networks, Bayesian models can reasonably fit big data while at the same time capturing model uncertainty. This thesis targets at a more challenging problem: how do we model general random objects, including discrete ones, using random functions? Our conclusion is: many (discrete) random objects are in nature a composition of Poisson processes and random functions}. Thus, all discreteness is handled through the Poisson process while random functions captures the rest complexities of the object. Thus the title: composing deep learning and Bayesian nonparametric methods.
This conclusion is not a conjecture. In spacial cases such as latent feature models , we can prove this claim by working on infinite dimensional spaces, and that is how Bayesian nonparametric kicks in. Moreover, we will assume some regularity assumptions on random objects such as exchangeability. Then the representations will show up magically using representation theorems. We will see this two times throughout this thesis.
One may ask: when a random object is too simple, such as a non-negative random vector in the case of latent feature models, how can we exploit exchangeability? The answer is to aggregate infinite random objects and map them altogether onto an infinite dimensional space. And then assume exchangeability on the infinite dimensional space. We demonstrate two examples of latent feature models by (1) concatenating them as an infinite sequence (Section 2,3) and (2) stacking them as a 2d array (Section 4).
Besides, we will see that Bayesian nonparametric methods are useful to model discrete patterns in time series data. We will showcase two examples: (1) using variance Gamma processes to model change points (Section 5), and (2) using Chinese restaurant processes to model speech with switching speakers (Section 6).
We also aware that the inference problem can be non-trivial in popular Bayesian nonparametric models. In Section 7, we find a novel solution of online inference for the popular HDP-HMM model
Nonlinear Gaussian Filtering : Theory, Algorithms, and Applications
By restricting to Gaussian distributions, the optimal Bayesian filtering problem can be transformed into an algebraically simple form, which allows for computationally efficient algorithms. Three problem settings are discussed in this thesis: (1) filtering with Gaussians only, (2) Gaussian mixture filtering for strong nonlinearities, (3) Gaussian process filtering for purely data-driven scenarios. For each setting, efficient algorithms are derived and applied to real-world problems
Unsupervised neural and Bayesian models for zero-resource speech processing
Zero-resource speech processing is a growing research area which aims to develop methods
that can discover linguistic structure and representations directly from unlabelled speech
audio. Such unsupervised methods would allow speech technology to be developed
in settings where transcriptions, pronunciation dictionaries, and text for language
modelling are not available. Similar methods are required for cognitive models of
language acquisition in human infants, and for developing robotic applications that are
able to automatically learn language in a novel linguistic environment.
There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units
(phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful
units. The claim of this thesis is that both top-down modelling (using knowledge of
higher-level units to to learn, discover and gain insight into their lower-level constituents)
as well as bottom-up modelling (piecing together lower-level features to give rise to
more complex higher-level structures) are advantageous in tackling these two problems.
The thesis is divided into three parts. The first part introduces a new autoencoder-like
deep neural network for unsupervised frame-level representation learning. This
correspondence autoencoder (cAE) uses weak top-down supervision from an unsupervised
term discovery system that identifies noisy word-like terms in unlabelled speech data.
In an intrinsic evaluation of frame-level representations, the cAE outperforms several
state-of-the-art bottom-up and top-down approaches, achieving a relative improvement
of more than 60% over the previous best system. This shows that the cAE is particularly
effective in using top-down knowledge of longer-spanning patterns in the data; at the
same time, we find that the cAE is only able to learn useful representations when it is
initialized using bottom-up pretraining on a large set of unlabelled speech. The second part of the thesis presents a novel unsupervised segmental Bayesian
model that segments unlabelled speech data and clusters the segments into hypothesized
word groupings. The result is a complete unsupervised tokenization of the input speech
in terms of discovered word types|the system essentially performs unsupervised speech
recognition. In this approach, a potential word segment (of arbitrary length) is embedded
in a fixed-dimensional vector space. The model, implemented as a Gibbs sampler, then
builds a whole-word acoustic model in this embedding space while jointly performing
segmentation. We first evaluate the approach in a small-vocabulary multi-speaker
connected digit recognition task, where we report unsupervised word error rates (WER)
by mapping the unsupervised decoded output to ground truth transcriptions. The model
achieves around 20% WER, outperforming a previous HMM-based system by about 10% absolute. To achieve this performance, the acoustic word embedding function (which
maps variable-duration segments to single vectors) is refined in a top-down manner by
using terms discovered by the model in an outer loop of segmentation.
The third and final part of the study extends the small-vocabulary system in order to handle larger vocabularies in conversational speech data. To our knowledge, this is the
first full-coverage segmentation and clustering system that is applied to large-vocabulary
multi-speaker data. To improve efficiency, the system incorporates a bottom-up syllable
boundary detection method to eliminate unlikely word boundaries. We compare the
system on English and Xitsonga datasets to several state-of-the-art baselines. We
show that by imposing a consistent top-down segmentation while also using bottom-up
knowledge from detected syllable boundaries, both single-speaker and multi-speaker
versions of our system outperform a purely bottom-up single-speaker syllable-based
approach. We also show that the discovered clusters can be made less speaker- and
gender-specific by using features from the cAE (which incorporates both top-down and
bottom-up learning). The system's discovered clusters are still less pure than those of
two multi-speaker unsupervised term discovery systems, but provide far greater coverage.
In summary, the different models and systems presented in this thesis show that both
top-down and bottom-up modelling can improve representation learning, segmentation
and clustering of unlabelled speech data
Estimating user interaction probability for non-guaranteed display advertising
Billions of advertisements are displayed to internet users every hour, a market worth approximately $110 billion in 2013. The process of displaying advertisements to internet users is managed
by advertising exchanges, automated systems which match advertisements to users while balancing
conflicting advertiser, publisher, and user objectives. Real-time bidding is a recent development in
the online advertising industry that allows more than one exchange (or demand-side platform) to
bid for the right to deliver an ad to a specific user while that user is loading a webpage, creating
a liquid market for ad impressions. Real-time bidding accounted for around 10% of the German
online advertising market in late 2013, a figure which is growing at an annual rate of around 40%.
In this competitive market, accurately calculating the expected value of displaying an ad to a user
is essential for profitability.
In this thesis, we develop a system that significantly improves the existing method for estimating
the value of displaying an ad to a user in a German advertising exchange and demand-side platform.
The most significant calculation in this system is estimating the probability of a user interacting
with an ad in a given context. We first implement a hierarchical main-effects and latent factor
model which is similar enough to the existing exchange system to allow a simple and robust upgrade
path, while improving performance substantially. We then use regularized generalized linear models
to estimate the probability of an ad interaction occurring following an individual user impression
event. We build a system capable of training thousands of campaign models daily, handling over 300
million events per day, 18 million recurrent users, and thousands of model dimensions. Together,
these systems improve on the log-likelihood of the existing method by over 10%.
We also provide an overview of the real-time bidding market microstructure in the German real-
time bidding market in September and November 2013, and indicate potential areas for exploiting
competitors’ behaviour, including building user features from real-time bid responses. Finally,
for personal interest, we experiment with scalable k-nearest neighbour search algorithms, nonlinear
dimension reduction, manifold regularization, graph clustering, and stochastic block model inference
using the large datasets from the linear model
Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain
466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts