Search CORE

28 research outputs found

Recommended from our members

Tracing the German Centennial Flood in the Stream of Tweets: First Lessons Learned

Author: Andrienko G.
Andrienko N.
Bothe S.
Fuchs G.
Stange H.
Publication venue
Publication date: 01/01/2013
Field of study

Social microblogging services such as Twitter result in massive streams of georeferenced messages and geolocated status updates. This real-time source of information is invaluable for many application areas, in particular for disaster detection and response scenarios. Consequently, a considerable number of works has dealt with issues of their acquisition, analysis and visualization. Most of these works not only assume an appropriate percentage of georeferenced messages that allows for detecting relevant events for a specific region and time frame, but also that these geolocations are reasonably correct in representing places and times of the underlying spatio-temporal situation. In this paper, we review these two key assumption based on the results of applying a visual analytics approach to a dataset of georeferenced Tweets from Germany over eight months witnessing several large-scale flooding situations throughout the country. Our results con rm the potential of Twitter as a distributed 'social sensor' but at the same time highlight some caveats in interpreting immediate results. To overcome these limits we explore incorporating evidence from other data sources including further social media and mobile phone network metrics to detect, confirm and refine events with respect to location and time. We summarize the lessons learned from our initial analysis by proposing recommendations and outline possible future work directions

City Research Online

Fraunhofer-ePrints

On the Accuracy of Hyper-local Geotagging of Social Media Content

Author: Flatow David
Kanza Yaron
Naaman Mor
Volkovich Yana
Xie Ke Eddie
Publication venue
Publication date: 01/02/2015
Field of study

Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.Comment: 10 page

arXiv.org e-Print Archive

CiteSeerX

Language influences on tweeter geolocation

Author: Mourad A
Sanderson M
Scholer F
Publication venue: Springer (Europe)
Publication date: 01/01/2017
Field of study

We investigate the influence of language on the accuracy of geolocating Twitter users. Our analysis, using a large corpus of tweets written in thirteen languages, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance has a greater impact on accuracy than geographical coverage. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. Our results suggest both averaging approaches should be used to effectively evaluate geolocation

RMIT Research Repository

Retrieval and interpretation of textual geolocalized information based on semantic geolocalized relations

Author: Korczynski Wojciech
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 31/12/2015
Field of study

This paper describes a method for geolocalized information retrieval from natural language text and its interpretation by assigning them geographic coordinates. A proof-of-concept implementation is discussed, along with geolocalized dictionary stored in PostGIS/PostgreSQL spatial relational database. Discussed research focuses on strongly inflectional Polish language, hence additional complexity had to be taken into account. Presented method has been evaluated with the use of diverse metrics

Computer Science Journal (AGH University of Science and Technology, Krakow)

Inferring the Origin Locations of Tweets with Quantitative Confidence

Author: Brown L. D.
Eisenstein J.
J.
Mahmud J.
McClendon S.
McLachlan G.
Neal R. M.
Paradesi S.
Roller S.
Schulz A.
Wing B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/11/2013
Field of study

Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new references, various other presentation improvements. Version 3: Various presentation improvements, accepted at ACM CSCW 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

Author: Liakata Maria
Procter Rob
Tsakalidis Adam
Voss Alex
Wang Bo
Zubiaga Arkaitz
Publication venue
Publication date: 01/01/2017
Field of study

In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Queen Mary Research Online

University of St. Andrews - Pure