167 research outputs found
City-level Geolocation of Tweets for Real-time Visual Analytics
Real-time tweets can provide useful information on evolving events and
situations. Geotagged tweets are especially useful, as they indicate the
location of origin and provide geographic context. However, only a small
portion of tweets are geotagged, limiting their use for situational awareness.
In this paper, we adapt, improve, and evaluate a state-of-the-art deep learning
model for city-level geolocation prediction, and integrate it with a visual
analytics system tailored for real-time situational awareness. We provide
computational evaluations to demonstrate the superiority and utility of our
geolocation prediction model within an interactive system.Comment: 4 pages, 2 tables, 1 figure, SIGSPATIAL GeoAI Worksho
Recommended from our members
Text-based document geolocation and its application to the digital humanities
This dissertation investigates automatic geolocation of documents (i.e. identification of their location, expressed as latitude/longitude coordinates), based on the text of those documents rather than metadata. I assert that such geolocation can be performed using text alone, at a sufficient accuracy for use in real-world applications. Although in some corpora metadata is found in abundance (e.g. home location, time zone, friends, followers, etc. in Twitter), it is lacking in others, such as many corpora of primary-source documents in the digital humanities, an area to which document geolocation has hardly been applied. To this end, I first develop methods for accurate text-based geolocation and then apply them to newly-annotated corpora in the digital humanities. The geolocation methods I develop use both uniform and adaptive (k-d tree) grids over the Earth’s surface, culminating in a hierarchical logistic-regression-based technique that achieves state of the art results on well-known corpora (Twitter user feeds, Wikipedia articles and Flickr image tags). In the second part of the dissertation I develop a new NLP task, text-based geolocation of historical corpora. Because there are no existing corpora to test on, I create and annotate two new corpora of significantly different natures (a 19th-century travel log and a large set of Civil War archives). I show how my methods produce good geolocation accuracy even given the relatively small amount of annotated data available, which can be further improved using domain adaptation. I then use the predictions on the much larger unannotated portion of the Civil War archives to generate and analyze geographic topic models, showing how they can be mined to produce interesting revelations concerning various Civil War-related subjects. Finally, I develop a new geolocation technique for text-only corpora involving co-training between document-geolocation and toponym- resolution models, using a gazetteer to inject additional information into the training process. To evaluate this technique I develop a new metric, the closest toponym error distance, on which I show improvements compared with a baseline geolocator.Linguistic
Neural geolocation prediction in Twitter
Inferring the location of a user has been a valuable step for many applications that leverage social media, such as marketing, security monitoring and recommendation systems. Motivated by the recent success of Deep Learning techniques for many tasks such as computer vision, speech recognition, and natural language processing, we study the application of neural models to the problem of geolocation prediction and experiment with multiple techniques to analyze neural networks for geolocation inference based solely on text. Experimental results on the dataset suggest that choosing appropriate network architecture can all increase performance on this task and demonstrate a promising extension of neural network based models for geolocation prediction. Our systematic extensive study of four supervised and three unsupervised tweet representations reveal that Convolutional Neural Networks (CNNs) and fastText best encode the the textual and geoloca- tional properties of tweets respectively. fastText emerges as the best model for low resource settings, providing very little degradation with reduction in embedding size
Modeling Global Syntactic Variation in English Using Dialect Classification
This paper evaluates global-scale dialect identification for 14 national
varieties of English as a means for studying syntactic variation. The paper
makes three main contributions: (i) introducing data-driven language mapping as
a method for selecting the inventory of national varieties to include in the
task; (ii) producing a large and dynamic set of syntactic features using
grammar induction rather than focusing on a few hand-selected features such as
function words; and (iii) comparing models across both web corpora and social
media corpora in order to measure the robustness of syntactic variation across
registers
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Automated curation of brand-related social media images with deep learning
This paper presents a work consisting in using deep convolutional neural networks (CNNs) to facilitate the curation of brand-related social media images. The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically annotated with multiple CNNs. Some of the CNNs perform generic object recognition tasks while others perform what we call visual brand identity recognition. When appropriate, we also apply object detection, usually to discover images containing logos. We report experiments with 5 real brands in which more than 1 million real images were analyzed. In order to speed-up the training of custom CNNs we applied a transfer learning strategy. We examine the impact of different configurations and derive conclusions aiming to pave the way towards systematic and optimized methodologies for automatic UGC curation.Peer ReviewedPostprint (author's final draft
지리적 거리 정보를 활용한 가짜 팔로워 구매자 식별 방법
학위논문 (박사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 김종권.The reputation of social media such as Twitter, Facebook, and Instagram now regard as one persons power in real-world. The person who has more friends or followers can influence more individuals. So the influence of users is associated with the number of friends or followers. On the demand of increasing social power, an underground market has emerged where a customer can buy fake followers. The one who purchase fake followers acts vigorously in online social network. Thus, it is hard to distinguish customer from celebrity or cyberstar. Nevertheless, there are unique characteristics of legitimate users that customers or fake followers cannot manipulate such as a small-world property. The small-world property is mainly qualified by the shortest-path and clustering coefficient. In the small-world network, most people are linked by short chains. Existing work has largely focused on extracting relationship features such as indegree, outdegree, status, hub, or authority. Even though these research explored the relationship features to classify abnormal users of fake follower markets, research that utilize the small-world property to detect abnormal users is not studied.
In this work, we propose a model that adapt the small-world property. Specifically, we study the geographical distance for 1hop-directional links using nodes geographical location to verify whether a social graph has the small-world property or not. Motivated by the difference of distance ratio for 1hop directional links, we propose a method which is designed to generate 1hop link distance ratio and classify a node as a customer or not. Experimental results on real-world Twitter dataset demonstrates that the proposed method achieves higher performance than existing models.Chapter 1 Introduction 1
1.1 Motivations 1
1.2 Fake Follower Markets 3
1.3 Research Objectives 5
1.4 Contributions 6
1.5 Thesis Organization 8
Chapter 2 Related Work 10
2.1 Small World Phenomenon 10
2.2 Online Social Abusing Attack Detection 11
2.2.1 Contents-based Detection 12
2.2.2 Social Network-based Detection 13
2.2.3 Behavior-based Detection 5
Chapter 3 Characteristic of Customers and Fake Followers 16
3.1 Data Preparation 16
3.2 Fake Follower Properties 21
3.3 Customer Properties 26
Chapter 4 Social Relationship and Geographical Distance 29
4.1 Geographical Distance in OSNs 29
4.2 Follower Ratio 34
Chapter 5 Detecting Customers 38
5.1 Key Features for Customer Detection 38
5.2 Performance matrices 40
5.3 Experiments 41
5.4 Comparison with Baseline Method 44
5.5 Comparison with Feature-based Method 47
5.6 Impact of Balanced Dataset 49
5.7 Fake Follower Detection 50
Chapter 6 Future Work 52
6.1 The Absence of Location Information 52
6.2 Hybrid Detection Method with Link Ratio and Profile Information 54
Chapter 7 Conclusion 56
Bibliography 58
국문초록 69Docto
- …