Search CORE

16,617 research outputs found

Temporal word embeddings for dynamic user profiling in Twitter

Author: Caputo Annalina
Kerin Breandán
Lawless Séamus
Publication venue: CEUR-WS
Publication date: 01/01/2019
Field of study

The research described in this paper focused on exploring the domain of user profiling, a nascent and contentious technology which has been steadily attracting increased interest from the research community as its potential for providing personalised digital services is realised. An extensive review of related literature revealed that limited research has been conducted into how temporal aspects of users can be captured using user profiling techniques. This, coupled with the notable lack of research into the use of word embedding techniques to capture temporal variances in language, revealed an opportunity to extend the Random Indexing word embedding technique such that the interests of users could be modelled based on their use of language. To achieve this, this work concerned itself with extending an existing implementation of Temporal Random Indexing to model Twitter users across multiple granularities of time based on their use of language. The product of this is a novel technique for temporal user profiling, where a set of vectors is used to describe the evolution of a Twitter user’s interests over time through their use of language. The vectors produced were evaluated against a temporal implementation of another state-of-the-art word embedding technique, the Word2Vec Dynamic Independent Skip-gram model, where it was found that Temporal Random Indexing outperformed Word2Vec in the generation of temporal user profiles

Irish Universities

DCU Online Research Access Service

Biased Embeddings from Wild Data: Measuring, Understanding and Removing

Author: Cristianini Nello
Lansdall-Welfare Thomas
Sutton Adam
Publication venue
Publication date: 16/06/2018
Field of study

Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered "from the wild" and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.Comment: Author's original versio

arXiv.org e-Print Archive

Explore Bristol Research

Regularising Factorised Models for Venue Recommendation using Friends and their Comments

Author: Macdonald Craig
Manotumruksa Jarana
Ounis Iadh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Venue recommendation is an important capability of Location-Based Social Networks such as Yelp and Foursquare. Matrix Factorisation (MF) is a collaborative filtering-based approach that can effectively recommend venues that are relevant to the users' preferences, by training upon either implicit or explicit feedbacks (e.g. check-ins or venue ratings) that these users express about venues. However, MF suffers in that users may only have rated very few venues. To alleviate this problem, recent literature have leveraged additional sources of evidence, e.g. using users' social friendships to reduce the complexity of - or regularise - the MF model, or identifying similar venues based on their comments. This paper argues for a combined regularisation model, where the venues suggested for a user are influenced by friends with similar tastes (as defined by their comments). We propose a MF regularisation technique that seamlessly incorporates both social network information and textual comments, by exploiting word embeddings to estimate a semantic similarity of friends based on their explicit textual feedback, to regularise the complexity of the factorised model. Experiments on a large existing dataset demonstrate that our proposed regularisation model is promising, and can enhance the prediction accuracy of several state-of-the-art matrix factorisation-based approaches

Enlighten

Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time

Author: Andrassy Bernt
Gupta Pankaj
Rajaram Subburam
Schütze Hinrich
Publication venue
Publication date: 01/01/2018
Field of study

Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.Comment: In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018

arXiv.org e-Print Archive

Crossref