2,735 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
Spatial And Temporal Patterns Of Geo-Tagged Tweets
With over 500 million current registered users and over 500 million tweets per day, Twitter has caught the attention of scientists in various disciplines. As Twitter allows users to send messages with location tags, a massive amount of valuable geo-social knowledge is embedded in tweets, which can provide useful implications for human geography, urban science, location-based service, targeted advertising, and social network studies. This thesis aims to determine the lifestyle patterns of college students by analyzing the spatial and temporal dynamics in their tweets. Geo-tagged tweets are collected over a period of six months for four US Midwestern college cites: 1) West Lafayette, Indiana (Purdue University); 2) Bloomington, Indiana (Indiana University); 3) Ann Arbor, Michigan (University of Michigan); 4) Columbus, Ohio (The Ohio State University). The overall distribution of the tweets was determined for each city, and the spatial patterns of representative individuals were examined as well. Grouping the tweets in time domains, the temporal patterns on an hourly, daily, and monthly basis were analyzed. Utilizing detailed land use data for each city, further insight about the thematic properties of the tweeting locations was obtained, leading to a deeper understanding about the life, mobility and flow patterns of Twitter users. Finally, space-time clusters and anomalies within tweets, which were considered events, were found with the space-time statistics. The results generally reflected everyday human activity patterns including the mobile population in each city as well as the commute behaviors of the representative users. The tweets also consistently revealed the occurrence of anomalies or events. The results of this thesis therefore confirmed the feasibility and promising future for using geo-tagged micro-blogging services such as Twitter in understanding human behavior patterns and other geo-social related studies
ORÁCULO: Detection of Spatiotemporal Hot Spots of Conflict-Related Events Extracted from Online News Sources
Dissertation presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceAchieving situational awareness in peace operations requires understanding
where and when conflict-related activity is most intense. However, the irregular nature
of most factions hinders the use of remote sensing, while winning the trust of the host
populations to allow the collection of wide-ranging human intelligence is a slow process.
Thus, our proposed solution, ORÁCULO, is an information system which detects
spatiotemporal hot spots of conflict-related activity by analyzing the patterns of events
extracted from online news sources, allowing immediate situational awareness. To do so,
it combines a closed-domain supervised event extractor with emerging hot spots analysis
of event space-time cubes. The prototype of ORÁCULO was tested on tweets scraped
from the Twitter accounts of local and international news sources covering the Central
African Republic Civil War, and its test results show that it achieved near state-of-theart
event extraction performance, significant overlap with a reference event dataset, and
strong correlation with the hot spots space-time cube generated from the reference event
dataset, proving the viability of the proposed solution. Future work will focus on
improving the event extraction performance and on testing ORÁCULO in cooperation
with peacekeeping organizations.
Keywords: event extraction, natural language understanding, spatiotemporal analysis,
peace operations, open-source intelligence.Atingir e manter a consciência situacional em operações de paz requer o
conhecimento de quando e onde é que a atividade relacionada com o conflito é mais
intensa. Porém, a natureza irregular da maioria das fações dificulta o uso de deteção
remota, e ganhar a confiança das populações para permitir a recolha de informações é
um processo moroso. Assim, a nossa solução proposta, ORÁCULO, consiste num sistema
de informações que deteta “hot spots” espácio-temporais de atividade relacionada com o
conflito através da análise dos padrões de eventos extraídos de fontes noticiosas online,
(incluindo redes sociais), permitindo consciência situacional imediata. Nesse sentido, a
nossa solução combina um extrator de eventos de domínio limitado baseado em
aprendizagem supervisionada com a análise de “hot spots” emergentes de cubos espaçotempo
de eventos. O protótipo de ORÁCULO foi testado em tweets recolhidos de fontes
noticiosas locais e internacionais que cobrem a Guerra Civil da República Centro-
Africana. Os resultados dos seus testes demonstram que foram conseguidos um
desempenho de extração de eventos próximo do estado da arte, uma sobreposição
significativa com um conjunto de eventos de referência e uma correlação forte com o
cubo espaço-tempo de “hot spots” gerado a partir desse conjunto de referência,
comprovando a viabilidade da solução proposta. Face aos resultados atingidos, o
trabalho futuro focar-se-á em melhorar o desempenho de extração de eventos e em testar
o sistema ORÁCULO em cooperação com organizações que conduzam operações paz
- …