103 research outputs found
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
Local News And Event Detection In Twitter
Twitter, one of the most popular micro-blogging services, allows users to publish
short messages on a wide variety of subjects such as news, events, stories, ideas, and opinions,
called tweets. The popularity of Twitter, to some extent, arises from its capability
of letting users promptly and conveniently contribute tweets to convey diverse information.
Specifically, with people discussing what is happening outside in the real world by
posting tweets, Twitter captures invaluable information about real-world news and events,
spanning a wide scale from large national or international stories like a presidential election
to small local stories such as a local farmers market. Detecting and extracting small
news and events for a local place is a challenging problem and is the focus of this thesis.
In particular, we explore several directions to extract and detect local news and events
using tweets in Twitter: a) how to identify local influential people on Twitter for potential
news seeders; b) how to recognize unusualness in tweet volume as signals of potential
local events; c) how to overcome the data sparsity of local tweets to detect more and
smaller undergoing local news and events. Additionally, we also try to uncover implicit
correlations between location, time, and text in tweets by learning embeddings for them
using a universal representation under the same semantic space.
In the first part, we investigate how to measure the spatial influence of Twitter users
by their interactions and thereby identify the locally influential users, which we found are
usually good news and event seeders in practice. In order to do this, we built a large-scale
directed interaction graph of Twitter users. Such a graph allows us to exploit PageRank
based ranking procedures to select top local influential people after innovatively incorporating
in geographical distance to the transition matrix used for the random walking.
In the second part, we study how to recognize the unusualness in tweet volume at
a local place as signals of potential ongoing local events. The intuition is that if there
is suddenly an abnormal change in the number of tweets at a location (e.g., a significant
increase), it may imply a potential local event. We, therefore, present DeLLe, a methodology
for automatically Detecting Latest Local Events from geotagged tweet streams (i.e.,
tweets that contain GPS points). With the help of novel spatiotemporal tweet count prediction
models, DeLLe first finds unusual locations which have aggregated an unexpected
number of tweets in the latest time period and then calculates, for each such unusual location,
a ranking score to identify the ones most likely to have ongoing local events by
addressing the temporal burstiness, spatial business, and topical coherence.
In the third part, we explore how to overcome the data sparsity of local tweets when
trying to discover more and smaller local news or events. Local tweets are those whose
locations fall inside a local place. They are very sparse in Twitter, which hinders the detection
of small local news or events that have only a handful of tweets. A system, called
Firefly, is proposed to enhance the local live tweet stream by tracking the tweets of a
large body of local people, and further perform a locality-aware keyword based clustering
for event detection. The intuition is that local tweets are published by local people,
and tracking their tweets naturally yields a source of local tweets. However, in practice,
only 20% Twitter users provide information about where they come from. Thus, a social
network-based geotagging procedure is subsequently proposed to estimate locations for
Twitter users whose locations are missing.
Finally, in order to discover correlations between location, time and text in geotagged
tweets, e.g., “find which locations are mostly related to the given topics“ and
“find which locations are similar to a given location“, we present LeGo, a methodology
for Learning embeddings of Geotagged tweets with respect to entities such as locations,
time units (hour-of-day and day-of-week) and textual words in tweets. The resulting compact
vector representations of these entities hence make it easy to measure the relatedness
between locations, time and words in tweets. LeGo comprises two working modes: crossmodal
search (LeGo-CM) and location-similarity search (LeGo-LS), to answer these two
types of queries accordingly. In LeGo-CM, we first build a graph of entities extracted
from tweets in which each edge carries the weight of co-occurrences between two entities.
The embeddings of graph nodes are then learned in the same latent space under
the guidance of approximating stationary residing probabilities between nodes which are
computed using personalized random walk procedures. In comparison, we supplement
edges between locations in LeGo-LS to address their underlying spatial proximity and
topic likeliness to support location-similarity search queries
A Twitter narrative of the COVID-19 pandemic in Australia
Social media platforms contain abundant data that can provide comprehensive
knowledge of historical and real-time events. During crisis events, the use of
social media peaks, as people discuss what they have seen, heard, or felt.
Previous studies confirm the usefulness of such socially generated discussions
for the public, first responders, and decision-makers to gain a better
understanding of events as they unfold at the ground level. This study performs
an extensive analysis of COVID-19-related Twitter discussions generated in
Australia between January 2020, and October 2022. We explore the Australian
Twitterverse by employing state-of-the-art approaches from both supervised and
unsupervised domains to perform network analysis, topic modeling, sentiment
analysis, and causality analysis. As the presented results provide a
comprehensive understanding of the Australian Twitterverse during the COVID-19
pandemic, this study aims to explore the discussion dynamics to aid the
development of future automated information systems for epidemic/pandemic
management.Comment: Accepted to ISCRAM 202
The role of geographic knowledge in sub-city level geolocation algorithms
Geolocation of microblog messages has been largely investigated in the lit-
erature. Many solutions have been proposed that achieve good results at the
city-level. Existing approaches are mainly data-driven (i.e., they rely on a
training phase). However, the development of algorithms for geolocation at
sub-city level is still an open problem also due to the absence of good training
datasets. In this thesis, we investigate the role that external geographic know-
ledge can play in geolocation approaches. We show how di)erent geographical
data sources can be combined with a semantic layer to achieve reasonably
accurate sub-city level geolocation. Moreover, we propose a knowledge-based
method, called Sherloc, to accurately geolocate messages at sub-city level, by
exploiting the presence in the message of toponyms possibly referring to the
speci*c places in the target geographical area. Sherloc exploits the semantics
associated with toponyms contained in gazetteers and embeds them into a
metric space that captures the semantic distance among them. This allows
toponyms to be represented as points and indexed by a spatial access method,
allowing us to identify the semantically closest terms to a microblog message,
that also form a cluster with respect to their spatial locations. In contrast to
state-of-the-art methods, Sherloc requires no prior training, it is not limited
to geolocating on a *xed spatial grid and it experimentally demonstrated its
ability to infer the location at sub-city level with higher accuracy
Geo-Information Harvesting from Social Media Data
As unconventional sources of geo-information, massive imagery and text
messages from open platforms and social media form a temporally quasi-seamless,
spatially multi-perspective stream, but with unknown and diverse quality. Due
to its complementarity to remote sensing data, geo-information from these
sources offers promising perspectives, but harvesting is not trivial due to its
data characteristics. In this article, we address key aspects in the field,
including data availability, analysis-ready data preparation and data
management, geo-information extraction from social media text messages and
images, and the fusion of social media and remote sensing data. We then
showcase some exemplary geographic applications. In addition, we present the
first extensive discussion of ethical considerations of social media data in
the context of geo-information harvesting and geographic applications. With
this effort, we wish to stimulate curiosity and lay the groundwork for
researchers who intend to explore social media data for geo-applications. We
encourage the community to join forces by sharing their code and data.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
SOCIAL MEDIA FOOTPRINTS OF PUBLIC PERCEPTION ON ENERGY ISSUES IN THE CONTERMINOUS UNITED STATES
Energy has been at the top of the national and global political agenda along with other concomitant challenges, such as poverty, disaster and climate change. Social perception on various energy issues, such as its availability, development and consumption deeply affect our energy future. This type of information is traditionally collected through structured energy surveys. However, these surveys are often subject to formidable costs and intensive labor, as well as a lack of temporal dimensions. Social media can provide a more cost-effective solution to collect massive amount of data on public opinions in a timely manner that may complement the survey. The purpose of this study is to use machine learning algorithms and social media conversations to characterize the spatiotemporal topics and social perception on different energy in terms of spatial and temporal dimensions. Text analysis algorithms, such as sentiment analysis and topic analysis, were employed to offer insights into the public attitudes and those prominent issues related to energy. The results show that the energy related public perceptions exhibited spatiotemporal dynamics. The study is expected to help inform decision making, formulate national energy policies, and update entrepreneurial energy development decisions
Social media mining under the COVID-19 context: Progress, challenges, and opportunities
Social media platforms allow users worldwide to create and share information, forging vast sensing networks that
allow information on certain topics to be collected, stored, mined, and analyzed in a rapid manner. During the
COVID-19 pandemic, extensive social media mining efforts have been undertaken to tackle COVID-19 challenges
from various perspectives. This review summarizes the progress of social media data mining studies in the
COVID-19 contexts and categorizes them into six major domains, including early warning and detection, human
mobility monitoring, communication and information conveying, public attitudes and emotions, infodemic and
misinformation, and hatred and violence. We further document essential features of publicly available COVID-19
related social media data archives that will benefit research communities in conducting replicable and repro�ducible studies. In addition, we discuss seven challenges in social media analytics associated with their potential
impacts on derived COVID-19 findings, followed by our visions for the possible paths forward in regard to social
media-based COVID-19 investigations. This review serves as a valuable reference that recaps social media mining
efforts in COVID-19 related studies and provides future directions along which the information harnessed from
social media can be used to address public health emergencies
Location Reference Recognition from Texts: A Survey and Comparison
A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs
- …