Search CORE

30,460 research outputs found

CRC for Construction Innovation : annual report 2008-2009

Author
Publication venue: CRC for Construction Innovation
Publication date: 01/01/2009
Field of study

Queensland University of Technology ePrints Archive

Nonparametric Bayesian Topic Modelling with Auxiliary Data

Author: Lim Kar Wai
Publication venue
Publication date: 01/01/2016
Field of study

The intent of this dissertation in computer science is to study topic models for text analytics. The first objective of this dissertation is to incorporate auxiliary information present in text corpora to improve topic modelling for natural language processing (NLP) applications. The second objective of this dissertation is to extend existing topic models to employ state-of-the-art nonparametric Bayesian techniques for better modelling of text data. In particular, this dissertation focusses on: - incorporating hashtags, mentions, emoticons, and target-opinion dependency present in tweets, together with an external sentiment lexicon, to perform opinion mining or sentiment analysis on products and services; - leveraging abstracts, titles, authors, keywords, categorical labels, and the citation network to perform bibliographic analysis on research publications, using a supervised or semi-supervised topic model; and - employing the hierarchical Pitman-Yor process (HPYP) and the Gaussian process (GP) to jointly model text, hashtags, authors, and the follower network in tweets for corpora exploration and summarisation. In addition, we provide a framework for implementing arbitrary HPYP topic models to ease the development of our proposed topic models, made possible by modularising the Pitman-Yor processes. Through extensive experiments and qualitative assessment, we find that topic models fit better to the data as we utilise more auxiliary information and by employing the Bayesian nonparametric method

The Australian National University

Word Embeddings: A Survey

Author: Almeida Felipe
Xexéo Geraldo
Publication venue
Publication date: 25/01/2019
Field of study

This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.Comment: 10 pages, 2 tables, 1 imag

arXiv.org e-Print Archive

Modelling trust in semantic web applications

Author: Albiston G
Osman T
Peytchev E
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2015
Field of study

This paper examines some of the barriers to the adoption of car-sharing, termed carpooling in the US, and develops a framework for trusted recommendations. The framework is established on a semantic modelling approach putting forward its suitability to resolving adoption barriers while also highlighting the characteristics of trust that can be exploited. Identification is made of potential vocabularies, ontologies and public social networks which can be used as the basis for deriving direct and indirect trust values in an implementation

Crossref

Nottingham Trent Institutional Repository (IRep)

MetaLDA: a Topic Model that Efficiently Incorporates Meta information

Author: Buntine Wray
Du Lan
Liu Gang
Zhao He
Publication venue
Publication date: 19/09/2017
Field of study

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.Comment: To appear in ICDM 201

arXiv.org e-Print Archive

Crossref

Dirichlet belief networks for topic structure learning

Author: Buntine Wray
Du Lan
Zhao He
Zhou Mingyuan
Publication venue
Publication date: 01/01/2018
Field of study

Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

arXiv.org e-Print Archive

Monash University Research Portal