Search CORE

5,122 research outputs found

Scalable Text and Link Analysis with Mixed-Topic Link Models

Author: Chang J.
Cohn D.
Cohn D.
Getoor L.
Gopalan P.
Gruber A.
Lu Q.
Lu Q.
Yang T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/03/2013
Field of study

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.Comment: 11 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hierarchical relational models for document networks

Author: Blei David M.
Chang Jonathan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

We develop the relational topic model (RTM), a hierarchical model of both network structure and node attributes. We focus on document networks, where the attributes of each document are its words, that is, discrete observations taken from a fixed vocabulary. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them, and predict words within them. We derive efficient inference and estimation algorithms based on variational methods that take advantage of sparsity and scale with the number of links. We evaluate the predictive performance of the RTM for large networks of scientific abstracts, web documents, and geographically tagged news.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS309 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

Recommended from our members

Visualising and animating visual information foraging in context

Author: Chen C
Cribbin T
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2001
Field of study

Optimal information foraging provides a potentially useful framework for modelling, analysing, and interpreting search strategies of users through a spatial-semantic interface. Improving the understanding of behavioural patterns of users in such environments has implications for the design and refinement of a range of user interfaces. In this article, we outline the role of optimal information foraging in the study of visual information retrieval and how one may use visualisation and animation techniques to put behavioural patterns in context. Behavioural patterns of information foraging in an information space are visualised and animated to aid further in-depth analysis of search strategies

Brunel University Research Archive

Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa

Author: Damianou Andreas
Drake Thomas
Maarek Yoelle
Yang Jie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted crowdsourcing approach, where multiple annotators with unknown expertise contribute an uncontrolled amount (often limited) of annotations. Our framework leverages the low-rank structure in annotations to learn individual annotator expertise, which then helps to infer the true labels from noisy and sparse annotations. It provides a unified Bayesian model to simultaneously infer the true labels and train the deep learning model in order to reach an optimal learning efficacy. Finally, our framework exploits the uncertainty of the deep learning model during prediction as well as the annotators' estimated expertise to minimize the number of required annotations and annotators for optimally training the deep learning model. We evaluate the effectiveness of our framework for intent classification in Alexa (Amazon's personal assistant), using both synthetic and real-world datasets. Experiments show that our framework can accurately learn annotator expertise, infer true labels, and effectively reduce the amount of annotations in model training as compared to state-of-the-art approaches. We further discuss the potential of our proposed framework in bridging machine learning and crowdsourcing towards improved human-in-the-loop systems

arXiv.org e-Print Archive

Crossref

Human-Centric Cyber Social Computing Model for Hot-Event Detection and Propagation

Author: Ali Haider
Jiang Liang
Kazim Muhammad
Liu Lu
Panneerselvam John
Shi Lei-Lei
Wu Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Microblogging networks have gained popularity in recent years as a platform enabling expressions of human emotions, through which users can conveniently produce contents on public events, breaking news, and/or products. Subsequently, microblogging networks generate massive amounts of data that carry opinions and mass sentiment on various topics. Herein, microblogging is regarded as a useful platform for detecting and propagating new hot events. It is also a useful channel for identifying high-quality posts, popular topics, key interests, and high-influence users. The existence of noisy data in the traditional social media data streams enforces to focus on human-centric computing. This paper proposes a human-centric social computing (HCSC) model for hot-event detection and propagation in microblogging networks. In the proposed HCSC model, all posts and users are preprocessed through hypertext induced topic search (HITS) for determining high-quality subsets of the users, topics, and posts. Then, a latent Dirichlet allocation (LDA)-based multiprototype user topic detection method is used for identifying users with high influence in the network. Furthermore, an influence maximization is used for final determination of influential users based on the user subsets. Finally, the users mined by influence maximization process are generated as the influential user sets for specific topics. Experimental results prove the superiority of our HCSC model against similar models of hot-event detection and information propagation

De Montfort University Open Research Archive

Leicester Research Archive

On the Predictability of Talk Attendance at Academic Conferences

Author: Atzmueller M.
H
Meriac M.
Scholz Christoph
Wongchokprasitti C.
Publication venue
Publication date: 02/07/2014
Field of study

This paper focuses on the prediction of real-world talk attendances at academic conferences with respect to different influence factors. We study the predictability of talk attendances using real-world tracked face-to-face contacts. Furthermore, we investigate and discuss the predictive power of user interests extracted from the users' previous publications. We apply Hybrid Rooted PageRank, a state-of-the-art unsupervised machine learning method that combines information from different sources. Using this method, we analyze and discuss the predictive power of contact and interest networks separately and in combination. We find that contact and similarity networks achieve comparable results, and that combinations of different networks can only to a limited extend help to improve the prediction quality. For our experiments, we analyze the predictability of talk attendance at the ACM Conference on Hypertext and Hypermedia 2011 collected using the conference management system Conferator

arXiv.org e-Print Archive

CiteSeerX

Crossref