Search CORE

22 research outputs found

Steering time-dependent estimation of posteriors with hyperparameter indexing in Bayesian topic models

Author: Masada Tomonari
Oguri Kiyoshi
Shibata Yuichiro
Takasu Atsuhiro
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

This paper provides a new approach to topical trend analysis. Our aim is to improve the generalization power of latent Dirichlet allocation (LDA) by using document timestamps. Many previous works model topical trends by making latent topic distributions time-dependent. We propose a straightforward approach by preparing a different word multinomial distribution for each time point. Since this approach increases the number of parameters, overfitting becomes a critical issue. Our contribution to this issue is two-fold. First, we propose an effective way of defining Dirichlet priors over the word multinomials. Second, we propose a special scheduling of variational Bayesian (VB) inference. Comprehensive experiments with six datasets prove that our approach can improve LDA and also Topics over Time, a well-known variant of LDA, in terms of test data perplexity in the framework of VB inference

Nagasaki University's Academic Output SITE: NAOSITE

Nonparametric Bayesian Topic Modelling with Auxiliary Data

Author: Lim Kar Wai
Publication venue
Publication date: 01/01/2016
Field of study

The intent of this dissertation in computer science is to study topic models for text analytics. The first objective of this dissertation is to incorporate auxiliary information present in text corpora to improve topic modelling for natural language processing (NLP) applications. The second objective of this dissertation is to extend existing topic models to employ state-of-the-art nonparametric Bayesian techniques for better modelling of text data. In particular, this dissertation focusses on: - incorporating hashtags, mentions, emoticons, and target-opinion dependency present in tweets, together with an external sentiment lexicon, to perform opinion mining or sentiment analysis on products and services; - leveraging abstracts, titles, authors, keywords, categorical labels, and the citation network to perform bibliographic analysis on research publications, using a supervised or semi-supervised topic model; and - employing the hierarchical Pitman-Yor process (HPYP) and the Gaussian process (GP) to jointly model text, hashtags, authors, and the follower network in tweets for corpora exploration and summarisation. In addition, we provide a framework for implementing arbitrary HPYP topic models to ease the development of our proposed topic models, made possible by modularising the Pitman-Yor processes. Through extensive experiments and qualitative assessment, we find that topic models fit better to the data as we utilise more auxiliary information and by employing the Bayesian nonparametric method

The Australian National University

Trend Analysis in AI Research over time Using NLP Techniques

Author: Saari Eemeli
Publication venue
Publication date: 17/06/2019
Field of study

The dramatic rise in the number of publications in machine learning related studies poses a challenge for companies and new researchers when they want to focus their resources effectively. This thesis aims to provide an automatic pipeline to extract the most relevant trends in the machine learning field. I applied unsupervised topic modeling methods to discover research trends from full NIPS conference papers from 1987 to 2018. By comparing the Latent Dirichlet Allocation (LDA) topic model with a model utilizing semantic word vectors (sHDP), it was shown that the LDA performed better in both quality and coherence. Using the LDA, 50 topics were extracted and interpreted to match the key concepts in the conference publications. The results revealed three distinct eras in the NIPS history as well as the steady shift away from the neural information processing roots towards deep learning

Trepo - Institutional Repository of Tampere University

Recommended from our members

Composing Deep Learning and Bayesian Nonparametric Methods

Author: Zhang Aonan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Recent progress in Bayesian methods largely focus on non-conjugate models featured with extensive use of black-box functions: continuous functions implemented with neural networks. Using deep neural networks, Bayesian models can reasonably fit big data while at the same time capturing model uncertainty. This thesis targets at a more challenging problem: how do we model general random objects, including discrete ones, using random functions? Our conclusion is: many (discrete) random objects are in nature a composition of Poisson processes and random functions}. Thus, all discreteness is handled through the Poisson process while random functions captures the rest complexities of the object. Thus the title: composing deep learning and Bayesian nonparametric methods. This conclusion is not a conjecture. In spacial cases such as latent feature models , we can prove this claim by working on infinite dimensional spaces, and that is how Bayesian nonparametric kicks in. Moreover, we will assume some regularity assumptions on random objects such as exchangeability. Then the representations will show up magically using representation theorems. We will see this two times throughout this thesis. One may ask: when a random object is too simple, such as a non-negative random vector in the case of latent feature models, how can we exploit exchangeability? The answer is to aggregate infinite random objects and map them altogether onto an infinite dimensional space. And then assume exchangeability on the infinite dimensional space. We demonstrate two examples of latent feature models by (1) concatenating them as an infinite sequence (Section 2,3) and (2) stacking them as a 2d array (Section 4). Besides, we will see that Bayesian nonparametric methods are useful to model discrete patterns in time series data. We will showcase two examples: (1) using variance Gamma processes to model change points (Section 5), and (2) using Chinese restaurant processes to model speech with switching speakers (Section 6). We also aware that the inference problem can be non-trivial in popular Bayesian nonparametric models. In Section 7, we find a novel solution of online inference for the popular HDP-HMM model

Columbia University Academic Commons

Nonlinear Gaussian Filtering : Theory, Algorithms, and Applications

Author: Huber Marco
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

By restricting to Gaussian distributions, the optimal Bayesian filtering problem can be transformed into an algebraically simple form, which allows for computationally efficient algorithms. Three problem settings are discussed in this thesis: (1) filtering with Gaussians only, (2) Gaussian mixture filtering for strong nonlinearities, (3) Gaussian process filtering for purely data-driven scenarios. For each setting, efficient algorithms are derived and applied to real-world problems

KITopen

Directory of Open Access Books (DOAB)

Unsupervised neural and Bayesian models for zero-resource speech processing

Author: Kamper Herman
Publication venue: The University of Edinburgh
Publication date: 03/01/2017
Field of study

Zero-resource speech processing is a growing research area which aims to develop methods that can discover linguistic structure and representations directly from unlabelled speech audio. Such unsupervised methods would allow speech technology to be developed in settings where transcriptions, pronunciation dictionaries, and text for language modelling are not available. Similar methods are required for cognitive models of language acquisition in human infants, and for developing robotic applications that are able to automatically learn language in a novel linguistic environment. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. The claim of this thesis is that both top-down modelling (using knowledge of higher-level units to to learn, discover and gain insight into their lower-level constituents) as well as bottom-up modelling (piecing together lower-level features to give rise to more complex higher-level structures) are advantageous in tackling these two problems. The thesis is divided into three parts. The first part introduces a new autoencoder-like deep neural network for unsupervised frame-level representation learning. This correspondence autoencoder (cAE) uses weak top-down supervision from an unsupervised term discovery system that identifies noisy word-like terms in unlabelled speech data. In an intrinsic evaluation of frame-level representations, the cAE outperforms several state-of-the-art bottom-up and top-down approaches, achieving a relative improvement of more than 60% over the previous best system. This shows that the cAE is particularly effective in using top-down knowledge of longer-spanning patterns in the data; at the same time, we find that the cAE is only able to learn useful representations when it is initialized using bottom-up pretraining on a large set of unlabelled speech. The second part of the thesis presents a novel unsupervised segmental Bayesian model that segments unlabelled speech data and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types|the system essentially performs unsupervised speech recognition. In this approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this embedding space while jointly performing segmentation. We first evaluate the approach in a small-vocabulary multi-speaker connected digit recognition task, where we report unsupervised word error rates (WER) by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% WER, outperforming a previous HMM-based system by about 10% absolute. To achieve this performance, the acoustic word embedding function (which maps variable-duration segments to single vectors) is refined in a top-down manner by using terms discovered by the model in an outer loop of segmentation. The third and final part of the study extends the small-vocabulary system in order to handle larger vocabularies in conversational speech data. To our knowledge, this is the first full-coverage segmentation and clustering system that is applied to large-vocabulary multi-speaker data. To improve efficiency, the system incorporates a bottom-up syllable boundary detection method to eliminate unlikely word boundaries. We compare the system on English and Xitsonga datasets to several state-of-the-art baselines. We show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using features from the cAE (which incorporates both top-down and bottom-up learning). The system's discovered clusters are still less pure than those of two multi-speaker unsupervised term discovery systems, but provide far greater coverage. In summary, the different models and systems presented in this thesis show that both top-down and bottom-up modelling can improve representation learning, segmentation and clustering of unlabelled speech data

arXiv.org e-Print Archive

Edinburgh Research Archive

Estimating user interaction probability for non-guaranteed display advertising

Author: Williams Alan
Publication venue: University of Canterbury
Publication date: 01/01/2014
Field of study

Billions of advertisements are displayed to internet users every hour, a market worth approximately $110 billion in 2013. The process of displaying advertisements to internet users is managed by advertising exchanges, automated systems which match advertisements to users while balancing conflicting advertiser, publisher, and user objectives. Real-time bidding is a recent development in the online advertising industry that allows more than one exchange (or demand-side platform) to bid for the right to deliver an ad to a specific user while that user is loading a webpage, creating a liquid market for ad impressions. Real-time bidding accounted for around 10% of the German online advertising market in late 2013, a figure which is growing at an annual rate of around 40%. In this competitive market, accurately calculating the expected value of displaying an ad to a user is essential for profitability. In this thesis, we develop a system that significantly improves the existing method for estimating the value of displaying an ad to a user in a German advertising exchange and demand-side platform. The most significant calculation in this system is estimating the probability of a user interacting with an ad in a given context. We first implement a hierarchical main-effects and latent factor model which is similar enough to the existing exchange system to allow a simple and robust upgrade path, while improving performance substantially. We then use regularized generalized linear models to estimate the probability of an ad interaction occurring following an individual user impression event. We build a system capable of training thousands of campaign models daily, handling over 300 million events per day, 18 million recurrent users, and thousands of model dimensions. Together, these systems improve on the log-likelihood of the existing method by over 10%. We also provide an overview of the real-time bidding market microstructure in the German real- time bidding market in September and November 2013, and indicate potential areas for exploiting competitors’ behaviour, including building user features from real-time bid responses. Finally, for personal interest, we experiment with scalable k-nearest neighbour search algorithms, nonlinear dimension reduction, manifold regularization, graph clustering, and stochastic block model inference using the large datasets from the linear model

UC Research Repository

Proceedings of the 9th International Symposium on Imprecise Probability: Theories and Applications

Author
Publication venue
Publication date: 01/06/2015
Field of study

CWI's Institutional Repository

Modeling adoption dynamics in social networks

Author: LUU Minh Duc
Publication venue: Singapore Management University
Publication date: 01/02/2017
Field of study

Institutional Knowledge at Singapore Management University

Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain

Author: International Workshop on Statistical Modelling (35º. 2020. Bilbao)
Irigoyen Garbizu Itziar
Lee Dae-Ji
Martínez-Minaya Joaquín
Rodríguez-Álvarez María Xosé
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2020
Field of study

466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts

Archivo Digital para la Docencia y la Investigación