307 research outputs found
Crowdsourced Rumour Identification During Emergencies
When a significant event occurs, many social media users leverage platforms such as Twitter to track that event. Moreover, emergency response agencies are increasingly looking to social media as a source of real-time information about such events. However, false information and rumours are often spread during such events, which can influence public opinion and limit the usefulness of social media for emergency management. In this paper, we present an initial study into rumour identification during emergencies using crowdsourcing. In particular, through an analysis of three tweet datasets relating to emergency events from 2014, we propose a taxonomy of tweets relating to rumours. We then perform a crowdsourced labeling experiment to determine whether crowd assessors can identify rumour-related tweets and where such labeling can fail. Our results show that overall, agreement over the tweet labels produced were high (0.7634 Fleiss Kappa), indicating that crowd-based rumour labeling is possible. However, not all tweets are of equal difficulty to assess. Indeed, we show that tweets containing disputed/controversial information tend to be some of the most difficult to identify
Modelling User Preferences using Word Embeddings for Context-Aware Venue Recommendation
Venue recommendation aims to assist users by making personalised
suggestions of venues to visit, building upon data available from
location-based social networks (LBSNs) such as Foursquare. A
particular challenge for this task is context-aware venue recommendation
(CAVR), which additionally takes the surrounding context of
the user (e.g. the user’s location and the time of day) into account
in order to provide more relevant venue suggestions. To address the
challenges of CAVR, we describe two approaches that exploit word
embedding techniques to infer the vector-space representations of
venues, users’ existing preferences, and users’ contextual preferences.
Our evaluation upon the test collection of the TREC 2015
Contextual Suggestion track demonstrates that we can significantly
enhance the effectiveness of a state-of-the-art venue recommendation
approach, as well as produce context-aware recommendations
that are at least as effective as the top TREC 2015 systems
Efficient & Effective Selective Query Rewriting with Efficiency Predictions
To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine
Analysing Compression Techniques for In-Memory Collaborative Filtering
Following the recent trend of in-memory data processing, it is a usual practice to maintain collaborative filtering data in the main memory when generating recommendations in academic and industrial recommender systems.
In this paper, we study the impact of integer compression techniques for in-memory collaborative filtering data in terms of space and time efficiency. Our results provide relevant observations about when and how to compress collaborative filtering data. First, we observe that, depending on the memory constraints, compression techniques may speed up or slow down the performance of state-of-the art collaborative filtering algorithms. Second, after comparing different compression techniques, we find the Frame of Reference (FOR) technique to be the best option in terms of space and time efficiency under different memory constraints
Using Word Embeddings in Twitter Election Classification
Word embeddings and convolutional neural networks (CNN)
have attracted extensive attention in various classification
tasks for Twitter, e.g. sentiment classification. However,
the effect of the configuration used to train and generate
the word embeddings on the classification performance has
not been studied in the existing literature. In this paper,
using a Twitter election classification task that aims to detect
election-related tweets, we investigate the impact of
the background dataset used to train the embedding models,
the context window size and the dimensionality of word
embeddings on the classification performance. By comparing
the classification results of two word embedding models,
which are trained using different background corpora
(e.g. Wikipedia articles and Twitter microposts), we show
that the background data type should align with the Twitter
classification dataset to achieve a better performance. Moreover,
by evaluating the results of word embeddings models
trained using various context window sizes and dimensionalities,
we found that large context window and dimension
sizes are preferable to improve the performance. Our experimental
results also show that using word embeddings and
CNN leads to statistically significant improvements over various
baselines such as random, SVM with TF-IDF and SVM
with word embeddings
Evaluating Bad Query Abandonment in an Iterative SMS-Based FAQ Retrieval System
In this paper, we investigate how many iterations users are willing to tolerate in an iterative Frequently Asked Ques- tion (FAQ) system that provides information on HIV/AIDS. This is part of work in progress that aims to develop an automated Frequently Asked Question system that can be used to provide answers on HIV/AIDS related queries to users in Botswana. Our system engages the user in the question answering process by following an iterative interaction approach in order to avoid giving inappropriate answers to the user. Our findings provide us with an indication of how long users are willing to engage with the system. We sub- sequently use this to develop a novel evaluation metric to use in future developments of the system. As an additional finding, we show that the previous search experience of the users has a significant effect on their future behaviour
Active Learning Strategies for Technology Assisted Sensitivity Review
Government documents must be reviewed to identify and protect any sensitive information, such as personal information, before the documents can be released to the public. However, in the era of digital government documents, such as e-mail, traditional sensitivity review procedures are no longer practical, for example due to the volume of documents to be reviewed. Therefore, there is a need for new technology assisted review protocols to integrate automatic sensitivity classification into the sensitivity review process. Moreover, to effectively assist sensitivity review, such assistive technologies must incorporate reviewer feedback to enable sensitivity classifiers to quickly learn and adapt to the sensitivities within a collection, when the types of sensitivity are not known a priori. In this work, we present a thorough evaluation of active learning strategies for sensitivity review. Moreover, we present an active learning strategy that integrates reviewer feedback, from sensitive text annotations, to identify features of sensitivity that enable us to learn an effective sensitivity classifier (0.7 Balanced Accuracy) using significantly less reviewer effort, according to the sign test (p < 0.01 ). Moreover, this approach results in a 51% reduction in the number of documents required to be reviewed to achieve the same level of classification accuracy, compared to when the approach is deployed without annotation features
Towards Maximising Openness in Digital Sensitivity Review using Reviewing Time Predictions
The adoption of born-digital documents, such as email, by governments, such as in the UK and USA, has resulted in a large backlog of born-digital documents that must be sensitivity reviewed before they can be opened to the public, to ensure that no sensitive information is released, e.g. personal or confidential information. However, it is not practical to review all of the backlog with the available reviewing resources and, therefore, there is a need for automatic techniques to increase the number of documents that can be opened within a fixed reviewing time budget. In this paper, we conduct a user study and use the log data to build models to predict reviewing times for an average sensitivity reviewer. Moreover, we show that using our reviewing time predictions to select the order that documents are reviewed can markedly increase the ratio of reviewed documents that are released to the public, e.g. +30% for collections with high levels of sensitivity, compared to reviewing by shortest document first. This, in turn, increases the total number of documents that are opened to the public within a fixed reviewing time budget, e.g. an extra 200 documents in 100 hours reviewing
Monitoring Electoral Violence through Social Media: A Machine Learning Approach
No abstract available
Regularising Factorised Models for Venue Recommendation using Friends and their Comments
Venue recommendation is an important capability of Location-Based Social Networks such as Yelp and Foursquare. Matrix Factorisation (MF) is a collaborative filtering-based approach that can effectively recommend venues that are relevant to the users' preferences, by training upon either implicit or explicit feedbacks (e.g. check-ins or venue ratings) that these users express about venues. However, MF suffers in that users may only have rated very few venues. To alleviate this problem, recent literature have leveraged additional sources of evidence, e.g. using users' social friendships to reduce the complexity of - or regularise - the MF model, or identifying similar venues based on their comments. This paper argues for a combined regularisation model, where the venues suggested for a user are influenced by friends with similar tastes (as defined by their comments). We propose a MF regularisation technique that seamlessly incorporates both social network information and textual comments, by exploiting word embeddings to estimate a semantic similarity of friends based on their explicit textual feedback, to regularise the complexity of the factorised model. Experiments on a large existing dataset demonstrate that our proposed regularisation model is promising, and can enhance the prediction accuracy of several state-of-the-art matrix factorisation-based approaches
- …