Search CORE

714 research outputs found

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

Author: Liakata Maria
Procter Rob
Tsakalidis Adam
Voss Alex
Wang Bo
Zubiaga Arkaitz
Publication venue
Publication date: 01/01/2017
Field of study

In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Queen Mary Research Online

University of St. Andrews - Pure

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Determine the User Country of a Tweet

Author: Broek Tijs van den
Ehrenhard Michel
Hiemstra Djoerd
Need Ariana
van der Veen Han
Publication venue
Publication date: 01/01/2015
Field of study

In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users' timezone, the user's language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.Comment: CTIT Technical Report, University of Twent

arXiv.org e-Print Archive

Radboud Repository

University of Twente Research Information

Understanding Citizen Reactions and Ebola-Related Information Propagation on Social Media

Author: Lee Kyumin
Tran Thanh
Publication venue
Publication date: 14/07/2016
Field of study

In severe outbreaks such as Ebola, bird flu and SARS, people share news, and their thoughts and responses regarding the outbreaks on social media. Understanding how people perceive the severe outbreaks, what their responses are, and what factors affect these responses become important. In this paper, we conduct a comprehensive study of understanding and mining the spread of Ebola-related information on social media. In particular, we (i) conduct a large-scale data-driven analysis of geotagged social media messages to understand citizen reactions regarding Ebola; (ii) build information propagation models which measure locality of information; and (iii) analyze spatial, temporal and social properties of Ebola-related information. Our work provides new insights into Ebola outbreak by understanding citizen reactions and topic-based information propagation, as well as providing a foundation for analysis and response of future public health crises.Comment: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016

arXiv.org e-Print Archive

Crossref