327 research outputs found

    Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

    Full text link
    We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP 2017) September 2017, Copenhagen, Denmar

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    A Neural Model for User Geolocation and Lexical Dialectology

    Full text link
    We propose a simple yet effective text- based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods

    Geolocation Predicting of Tweets Using BERT-Based Models

    Full text link
    This research is aimed to solve the tweet/user geolocation prediction task and provide a flexible methodology for the geotagging of textual big data. The suggested approach implements neural networks for natural language processing (NLP) to estimate the location as coordinate pairs (longitude, latitude) and two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder Representations from Transformers (BERT) as base models. Performance metrics show a median error of fewer than 30 km on a worldwide-level, and fewer than 15 km on the US-level datasets for the models trained and evaluated on text features of tweets' content and metadata context.Comment: 27 pages, 6 tables, 7 figure

    Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review

    Full text link
    The task of multimedia geolocation is becoming an increasingly essential component of the digital forensics toolkit to effectively combat human trafficking, child sexual exploitation, and other illegal acts. Typically, metadata-based geolocation information is stripped when multimedia content is shared via instant messaging and social media. The intricacy of geolocating, geotagging, or finding geographical clues in this content is often overly burdensome for investigators. Recent research has shown that contemporary advancements in artificial intelligence, specifically computer vision and deep learning, show significant promise towards expediting the multimedia geolocation task. This systematic literature review thoroughly examines the state-of-the-art leveraging computer vision techniques for multimedia geolocation and assesses their potential to expedite human trafficking investigation. This includes a comprehensive overview of the application of computer vision-based approaches to multimedia geolocation, identifies their applicability in combating human trafficking, and highlights the potential implications of enhanced multimedia geolocation for prosecuting human trafficking. 123 articles inform this systematic literature review. The findings suggest numerous potential paths for future impactful research on the subject

    Neural geolocation prediction in Twitter

    Get PDF
    Inferring the location of a user has been a valuable step for many applications that leverage social media, such as marketing, security monitoring and recommendation systems. Motivated by the recent success of Deep Learning techniques for many tasks such as computer vision, speech recognition, and natural language processing, we study the application of neural models to the problem of geolocation prediction and experiment with multiple techniques to analyze neural networks for geolocation inference based solely on text. Experimental results on the dataset suggest that choosing appropriate network architecture can all increase performance on this task and demonstrate a promising extension of neural network based models for geolocation prediction. Our systematic extensive study of four supervised and three unsupervised tweet representations reveal that Convolutional Neural Networks (CNNs) and fastText best encode the the textual and geoloca- tional properties of tweets respectively. fastText emerges as the best model for low resource settings, providing very little degradation with reduction in embedding size

    Machine Learning as a Tool for Wildlife Management and Research: The Case of Wild Pig-Related Content on Twitter

    Get PDF
    Wild pigs (Sus scrofa) are a non-native, invasive species that cause considerable damage and transmit a variety of diseases to livestock, people, and wildlife. We explored Twitter, the most popular social media micro-blogging platform, to demonstrate how social media data can be leveraged to investigate social identity and sentiment toward wild pigs. In doing so, we employed a sophisticated machine learning approach to investigate: (1) the overall sentiment associated with the dataset, (2) online identities via user profile descriptions, and (3) the extent to which sentiment varied by online identity. Results indicated that the largest groups of online identity represented in our dataset were females and people whose occupation was in journalism and media communication. While the majority of our data indicated a negative sentiment toward wild pigs and other related search terms, users who identified with agriculture-related occupations had more favorable sentiment. Overall, this article is an important starting point for further investigation of the use of social media data and social identity in the context of wild pigs and other invasive species
    • …
    corecore