Search CORE

167 research outputs found

City-level Geolocation of Tweets for Real-time Visual Analytics

Author: Chen Ray
Ebert David S.
Karimzadeh Morteza
Snyder Luke S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/10/2019
Field of study

Real-time tweets can provide useful information on evolving events and situations. Geotagged tweets are especially useful, as they indicate the location of origin and provide geographic context. However, only a small portion of tweets are geotagged, limiting their use for situational awareness. In this paper, we adapt, improve, and evaluate a state-of-the-art deep learning model for city-level geolocation prediction, and integrate it with a visual analytics system tailored for real-time situational awareness. We provide computational evaluations to demonstrate the superiority and utility of our geolocation prediction model within an interactive system.Comment: 4 pages, 2 tables, 1 figure, SIGSPATIAL GeoAI Worksho

arXiv.org e-Print Archive

Crossref

Recommended from our members

Text-based document geolocation and its application to the digital humanities

Author: Wing Benjamin Patai
Publication venue
Publication date: 14/09/2016
Field of study

This dissertation investigates automatic geolocation of documents (i.e. identification of their location, expressed as latitude/longitude coordinates), based on the text of those documents rather than metadata. I assert that such geolocation can be performed using text alone, at a sufficient accuracy for use in real-world applications. Although in some corpora metadata is found in abundance (e.g. home location, time zone, friends, followers, etc. in Twitter), it is lacking in others, such as many corpora of primary-source documents in the digital humanities, an area to which document geolocation has hardly been applied. To this end, I first develop methods for accurate text-based geolocation and then apply them to newly-annotated corpora in the digital humanities. The geolocation methods I develop use both uniform and adaptive (k-d tree) grids over the Earth’s surface, culminating in a hierarchical logistic-regression-based technique that achieves state of the art results on well-known corpora (Twitter user feeds, Wikipedia articles and Flickr image tags). In the second part of the dissertation I develop a new NLP task, text-based geolocation of historical corpora. Because there are no existing corpora to test on, I create and annotate two new corpora of significantly different natures (a 19th-century travel log and a large set of Civil War archives). I show how my methods produce good geolocation accuracy even given the relatively small amount of annotated data available, which can be further improved using domain adaptation. I then use the predictions on the much larger unannotated portion of the Civil War archives to generate and analyze geographic topic models, showing how they can be mined to produce interesting revelations concerning various Civil War-related subjects. Finally, I develop a new geolocation technique for text-only corpora involving co-training between document-geolocation and toponym- resolution models, using a gazetteer to inject additional information into the training process. To evaluate this technique I develop a new metric, the closest toponym error distance, on which I show improvements compared with a baseline geolocator.Linguistic

Texas ScholarWorks

Neural geolocation prediction in Twitter

Author: Srinivasan Pramod
Publication venue
Publication date: 01/05/2017
Field of study

Inferring the location of a user has been a valuable step for many applications that leverage social media, such as marketing, security monitoring and recommendation systems. Motivated by the recent success of Deep Learning techniques for many tasks such as computer vision, speech recognition, and natural language processing, we study the application of neural models to the problem of geolocation prediction and experiment with multiple techniques to analyze neural networks for geolocation inference based solely on text. Experimental results on the dataset suggest that choosing appropriate network architecture can all increase performance on this task and demonstrate a promising extension of neural network based models for geolocation prediction. Our systematic extensive study of four supervised and three unsupervised tweet representations reveal that Convolutional Neural Networks (CNNs) and fastText best encode the the textual and geoloca- tional properties of tweets respectively. fastText emerges as the best model for low resource settings, providing very little degradation with reduction in embedding size

Illinois Digital Environment for Access to Learning and Scholarship Repository

Modeling Global Syntactic Variation in English Using Dialect Classification

Author: Dunn Jonathan
Publication venue
Publication date: 11/04/2019
Field of study

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

arXiv.org e-Print Archive

UC Research Repository

Extracting News Events from Microblogs

Author: Ramampiaro Heri
Repp Øystein
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

A deep semantic search method for random tweets

Author: INUWA-DUTSE ISA
KORKONTZELOS YANNIS
LIPTROTT MARK
Publication venue: 'Elsevier BV'
Publication date: 01/09/2019
Field of study

Edge Hill University Research Information Repository

Automated curation of brand-related social media images with deep learning

Author: Ayguadé Parra Eduard
Cruz Leonel
Gómez Parada Mauro
Makni Mouna
Poveda Jonatan
Tous Liesa Rubén
Wust Otto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This paper presents a work consisting in using deep convolutional neural networks (CNNs) to facilitate the curation of brand-related social media images. The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically annotated with multiple CNNs. Some of the CNNs perform generic object recognition tasks while others perform what we call visual brand identity recognition. When appropriate, we also apply object detection, usually to discover images containing logos. We report experiments with 5 real brands in which more than 1 million real images were analyzed. In order to speed-up the training of custom CNNs we applied a transfer learning strategy. We examine the impact of different configurations and derive conclusions aiming to pave the way towards systematic and optimized methodologies for automatic UGC curation.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

지리적 거리 정보를 활용한 가짜 팔로워 구매자 식별 방법

Author: 장보연
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김종권.The reputation of social media such as Twitter, Facebook, and Instagram now regard as one persons power in real-world. The person who has more friends or followers can influence more individuals. So the influence of users is associated with the number of friends or followers. On the demand of increasing social power, an underground market has emerged where a customer can buy fake followers. The one who purchase fake followers acts vigorously in online social network. Thus, it is hard to distinguish customer from celebrity or cyberstar. Nevertheless, there are unique characteristics of legitimate users that customers or fake followers cannot manipulate such as a small-world property. The small-world property is mainly qualified by the shortest-path and clustering coefficient. In the small-world network, most people are linked by short chains. Existing work has largely focused on extracting relationship features such as indegree, outdegree, status, hub, or authority. Even though these research explored the relationship features to classify abnormal users of fake follower markets, research that utilize the small-world property to detect abnormal users is not studied. In this work, we propose a model that adapt the small-world property. Specifically, we study the geographical distance for 1hop-directional links using nodes geographical location to verify whether a social graph has the small-world property or not. Motivated by the difference of distance ratio for 1hop directional links, we propose a method which is designed to generate 1hop link distance ratio and classify a node as a customer or not. Experimental results on real-world Twitter dataset demonstrates that the proposed method achieves higher performance than existing models.Chapter 1 Introduction 1 1.1 Motivations 1 1.2 Fake Follower Markets 3 1.3 Research Objectives 5 1.4 Contributions 6 1.5 Thesis Organization 8 Chapter 2 Related Work 10 2.1 Small World Phenomenon 10 2.2 Online Social Abusing Attack Detection 11 2.2.1 Contents-based Detection 12 2.2.2 Social Network-based Detection 13 2.2.3 Behavior-based Detection 5 Chapter 3 Characteristic of Customers and Fake Followers 16 3.1 Data Preparation 16 3.2 Fake Follower Properties 21 3.3 Customer Properties 26 Chapter 4 Social Relationship and Geographical Distance 29 4.1 Geographical Distance in OSNs 29 4.2 Follower Ratio 34 Chapter 5 Detecting Customers 38 5.1 Key Features for Customer Detection 38 5.2 Performance matrices 40 5.3 Experiments 41 5.4 Comparison with Baseline Method 44 5.5 Comparison with Feature-based Method 47 5.6 Impact of Balanced Dataset 49 5.7 Fake Follower Detection 50 Chapter 6 Future Work 52 6.1 The Absence of Location Information 52 6.2 Hybrid Detection Method with Link Ratio and Profile Information 54 Chapter 7 Conclusion 56 Bibliography 58 국문초록 69Docto

SNU Open Repository and Archive