13,093 research outputs found

    Is That Twitter Hashtag Worth Reading

    Full text link
    Online social media such as Twitter, Facebook, Wikis and Linkedin have made a great impact on the way we consume information in our day to day life. Now it has become increasingly important that we come across appropriate content from the social media to avoid information explosion. In case of Twitter, popular information can be tracked using hashtags. Studying the characteristics of tweets containing hashtags becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, and sentiment analysis among others. In this paper, we have analyzed Twitter data based on trending hashtags, which is widely used nowadays. We have used event based hashtags to know users' thoughts on those events and to decide whether the rest of the users might find it interesting or not. We have used topic modeling, which reveals the hidden thematic structure of the documents (tweets in this case) in addition to sentiment analysis in exploring and summarizing the content of the documents. A technique to find the interestingness of event based twitter hashtag and the associated sentiment has been proposed. The proposed technique helps twitter follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium on Women in Computing and Informatics (WCI-2015

    Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event. This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively

    Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample

    Get PDF
    In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score

    Deception Detection with Feature-Augmentation by soft Domain Transfer

    Full text link
    In this era of information explosion, deceivers use different domains or mediums of information to exploit the users, such as News, Emails, and Tweets. Although numerous research has been done to detect deception in all these domains, information shortage in a new event necessitates these domains to associate with each other to battle deception. To form this association, we propose a feature augmentation method by harnessing the intermediate layer representation of neural models. Our approaches provide an improvement over the self-domain baseline models by up to 6.60%. We find Tweets to be the most helpful information provider for Fake News and Phishing Email detection, whereas News helps most in Tweet Rumor detection. Our analysis provides a useful insight for domain knowledge transfer which can help build a stronger deception detection system than the existing literature

    Image Analysis Enhanced Event Detection from Geo-tagged Tweet Streams

    Full text link
    Events detected from social media streams often include early signs of accidents, crimes or disasters. Therefore, they can be used by related parties for timely and efficient response. Although significant progress has been made on event detection from tweet streams, most existing methods have not considered the posted images in tweets, which provide richer information than the text, and potentially can be a reliable indicator of whether an event occurs or not. In this paper, we design an event detection algorithm that combines textual, statistical and image information, following an unsupervised machine learning approach. Specifically, the algorithm starts with semantic and statistical analyses to obtain a list of tweet clusters, each of which corresponds to an event candidate, and then performs image analysis to separate events from non-events---a convolutional autoencoder is trained for each cluster as an anomaly detector, where a part of the images are used as the training data and the remaining images are used as the test instances. Our experiments on multiple datasets verify that when an event occurs, the mean reconstruction errors of the training and test images are much closer, compared with the case where the candidate is a non-event cluster. Based on this finding, the algorithm rejects a candidate if the difference is larger than a threshold. Experimental results over millions of tweets demonstrate that this image analysis enhanced approach can significantly increase the precision with minimum impact on the recall.Comment: 12 pages, 4 figure

    Event Detection on Twitter

    Full text link
    Detecting events by using social media has been an active research problem. In this work, we investigate and compare the performance of two methods for event detection in Twitter by using Apache Storm as the stream processing infrastructure. The first event detection method is based on identifying uncommonly common words inside tweet blocks, and the second one is based on clustering tweets to detect a cluster as an event. Each of the methods has its own characteristics. Uncommonly common word based method relies on the burst of words and hence is not affected from concurrency problems in distributed environment. On the other hand, clustering based method includes a finer grained analysis, but it is sensitive to the concurrent processing. We investigate the effect of stream processing and concurrency handling support provided by Apace Storm on event detection by these methods

    The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness

    Full text link
    Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. In this paper we adapt existing bio-surveillance algorithms to detect localised spikes in Twitter activity corresponding to real events with a high level of confidence. We then develop a methodology to automatically summarise these events, both by providing the tweets which fully describe the event and by linking to highly relevant news articles. We apply our methods to outbreaks of illness and events strongly affecting sentiment. In both case studies we are able to detect events verifiable by third party sources and produce high quality summaries
    • …
    corecore