28 research outputs found
TREC Incident Streams: Finding Actionable Information on Social Media
The Text Retrieval Conference (TREC) Incident Streams track is a new initiative that aims to mature social
media-based emergency response technology. This initiative advances the state of the art in this area through an
evaluation challenge, which attracts researchers and developers from across the globe. The 2018 edition of the track
provides a standardized evaluation methodology, an ontology of emergency-relevant social media information types,
proposes a scale for information criticality, and releases a dataset containing fifteen test events and approximately
20,000 labeled tweets. Analysis of this dataset reveals a significant amount of actionable information on social
media during emergencies (> 10%). While this data is valuable for emergency response efforts, analysis of the
39 state-of-the-art systems demonstrate a performance gap in identifying this data. We therefore find the current
state-of-the-art is insufficient for emergency responders’ requirements, particularly for rare actionable information
for which there is little prior training data available
Incident Streams 2019: Actionable Insights and How to Find Them
The ubiquity of mobile internet-enabled devices combined with wide-spread social media use during emergencies is posing new challenges for response personnel. In particular, service operators are now expected to monitor these online channels to extract actionable insights and answer questions from the public. A lack of adequate tools makes this monitoring impractical at the scale of many emergencies. The TREC Incident Streams (TREC-IS) track drives research into solving this technology gap by bringing together academia and industry to develop techniques for extracting actionable insights from social media streams during emergencies. This paper covers the second year of TREC-IS, hosted in 2019 with two editions, 2019-A and 2019-B, contributing 12 new events and approximately 20,000 new tweets across 25 information categories, with 15 research groups participating across the world. This paper provides an overview of these new editions, actionable insights from data labelling, and the automated techniques employed by participant systems that appear most effective
TREC Incident Streams: Finding Actionable Information on Social Media
The Text Retrieval Conference (TREC) Incident Streams track is a new initiative that aims to mature social
media-based emergency response technology. This initiative advances the state of the art in this area through an
evaluation challenge, which attracts researchers and developers from across the globe. The 2018 edition of the track
provides a standardized evaluation methodology, an ontology of emergency-relevant social media information types,
proposes a scale for information criticality, and releases a dataset containing fifteen test events and approximately
20,000 labeled tweets. Analysis of this dataset reveals a significant amount of actionable information on social
media during emergencies (> 10%). While this data is valuable for emergency response efforts, analysis of the
39 state-of-the-art systems demonstrate a performance gap in identifying this data. We therefore find the current
state-of-the-art is insufficient for emergency responders’ requirements, particularly for rare actionable information
for which there is little prior training data available
Hawkes binomial topic model with applications to coupled conflict-Twitter data
We consider the problem of modeling and clustering heterogeneous event data arising from coupled conflict event and social media data sets. In this setting conflict events trigger responses on social media, and, at the same time, signals of grievance detected in social media may serve as leading indicators for subsequent conflict events. For this purpose we introduce the Hawkes Binomial Topic Model (HBTM) where marks, Tweets and conflict event descriptions are represented as bags of words following a Binomial distribution. When viewed as a branching process, the daughter event bag of words is generated by randomly turning on/off parent words through independent Bernoulli random variables. We then use expectation–maximization to estimate the model parameters and branching structure of the process. The inferred branching structure is then used for topic cascade detection, short-term forecasting, and investigating the causal dependence of grievance on social media and conflict events in recent elections in Nigeria and Kenya
Analyzing a fake news authorship network
This project synthesizes a set of 246 fake news websites previously identified in three earlier research projects. From this dataset, we extract a set of all authors who have written for these sites in 2016. This authorcentric dataset is itself a contribution that will allow future analysis of the fake news ecosystem. Based on the data we collected, we construct a network of fake news sites, linking them if they shared a common author. Our analysis shows a tight cluster of author-sharing sites, with a small core set of sites sharing dozens of authors