327 research outputs found
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
We propose a method for embedding two-dimensional locations in a continuous
vector space using a neural network-based model incorporating mixtures of
Gaussian distributions, presenting two model variants for text-based
geolocation and lexical dialectology. Evaluated over Twitter data, the proposed
model outperforms conventional regression-based geolocation and provides a
better estimate of uncertainty. We also show the effectiveness of the
representation for predicting words from location in lexical dialectology, and
evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP
2017) September 2017, Copenhagen, Denmar
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
A Neural Model for User Geolocation and Lexical Dialectology
We propose a simple yet effective text- based user geolocation model based on
a neural network with one hidden layer, which achieves state of the art
performance over three Twitter benchmark geolocation datasets, in addition to
producing word and phrase embeddings in the hidden layer that we show to be
useful for detecting dialectal terms. As part of our analysis of dialectal
terms, we release DAREDS, a dataset for evaluating dialect term detection
methods
Geolocation Predicting of Tweets Using BERT-Based Models
This research is aimed to solve the tweet/user geolocation prediction task
and provide a flexible methodology for the geotagging of textual big data. The
suggested approach implements neural networks for natural language processing
(NLP) to estimate the location as coordinate pairs (longitude, latitude) and
two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models
has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder
Representations from Transformers (BERT) as base models. Performance metrics
show a median error of fewer than 30 km on a worldwide-level, and fewer than 15
km on the US-level datasets for the models trained and evaluated on text
features of tweets' content and metadata context.Comment: 27 pages, 6 tables, 7 figure
Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review
The task of multimedia geolocation is becoming an increasingly essential
component of the digital forensics toolkit to effectively combat human
trafficking, child sexual exploitation, and other illegal acts. Typically,
metadata-based geolocation information is stripped when multimedia content is
shared via instant messaging and social media. The intricacy of geolocating,
geotagging, or finding geographical clues in this content is often overly
burdensome for investigators. Recent research has shown that contemporary
advancements in artificial intelligence, specifically computer vision and deep
learning, show significant promise towards expediting the multimedia
geolocation task. This systematic literature review thoroughly examines the
state-of-the-art leveraging computer vision techniques for multimedia
geolocation and assesses their potential to expedite human trafficking
investigation. This includes a comprehensive overview of the application of
computer vision-based approaches to multimedia geolocation, identifies their
applicability in combating human trafficking, and highlights the potential
implications of enhanced multimedia geolocation for prosecuting human
trafficking. 123 articles inform this systematic literature review. The
findings suggest numerous potential paths for future impactful research on the
subject
Neural geolocation prediction in Twitter
Inferring the location of a user has been a valuable step for many applications that leverage social media, such as marketing, security monitoring and recommendation systems. Motivated by the recent success of Deep Learning techniques for many tasks such as computer vision, speech recognition, and natural language processing, we study the application of neural models to the problem of geolocation prediction and experiment with multiple techniques to analyze neural networks for geolocation inference based solely on text. Experimental results on the dataset suggest that choosing appropriate network architecture can all increase performance on this task and demonstrate a promising extension of neural network based models for geolocation prediction. Our systematic extensive study of four supervised and three unsupervised tweet representations reveal that Convolutional Neural Networks (CNNs) and fastText best encode the the textual and geoloca- tional properties of tweets respectively. fastText emerges as the best model for low resource settings, providing very little degradation with reduction in embedding size
Machine Learning as a Tool for Wildlife Management and Research: The Case of Wild Pig-Related Content on Twitter
Wild pigs (Sus scrofa) are a non-native, invasive species that cause considerable damage and transmit a variety of diseases to livestock, people, and wildlife. We explored Twitter, the most popular social media micro-blogging platform, to demonstrate how social media data can be leveraged to investigate social identity and sentiment toward wild pigs. In doing so, we employed a sophisticated machine learning approach to investigate: (1) the overall sentiment associated with the dataset, (2) online identities via user profile descriptions, and (3) the extent to which sentiment varied by online identity. Results indicated that the largest groups of online identity represented in our dataset were females and people whose occupation was in journalism and media communication. While the majority of our data indicated a negative sentiment toward wild pigs and other related search terms, users who identified with agriculture-related occupations had more favorable sentiment. Overall, this article is an important starting point for further investigation of the use of social media data and social identity in the context of wild pigs and other invasive species
- …