1,778 research outputs found
e-Counterfeit: a mobile-server platform for document counterfeit detection
This paper presents a novel application to detect counterfeit identity
documents forged by a scan-printing operation. Texture analysis approaches are
proposed to extract validation features from security background that is
usually printed in documents as IDs or banknotes. The main contribution of this
work is the end-to-end mobile-server architecture, which provides a service for
non-expert users and therefore can be used in several scenarios. The system
also provides a crowdsourcing mode so labeled images can be gathered,
generating databases for incremental training of the algorithms.Comment: 6 pages, 5 figure
The Role of Conversation Context for Sarcasm Detection in Online Interactions
Computational models for sarcasm detection have often relied on the content
of utterances in isolation. However, speaker's sarcastic intent is not always
obvious without additional context. Focusing on social media discussions, we
investigate two issues: (1) does modeling of conversation context help in
sarcasm detection and (2) can we understand what part of conversation context
triggered the sarcastic reply. To address the first issue, we investigate
several types of Long Short-Term Memory (LSTM) networks that can model both the
conversation context and the sarcastic response. We show that the conditional
LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level
attention on context and response outperform the LSTM model that reads only the
response. To address the second issue, we present a qualitative analysis of
attention weights produced by the LSTM models with attention and discuss the
results compared with human performance on the task.Comment: SIGDial 201
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Controversy and Sentiment in Online News
How do news sources tackle controversial issues? In this work, we take a
data-driven approach to understand how controversy interplays with emotional
expression and biased language in the news. We begin by introducing a new
dataset of controversial and non-controversial terms collected using
crowdsourcing. Then, focusing on 15 major U.S. news outlets, we compare
millions of articles discussing controversial and non-controversial issues over
a span of 7 months. We find that in general, when it comes to controversial
issues, the use of negative affect and biased language is prevalent, while the
use of strong emotion is tempered. We also observe many differences across news
sources. Using these findings, we show that we can indicate to what extent an
issue is controversial, by comparing it with other issues in terms of how they
are portrayed across different media.Comment: Computation+Journalism Symposium 201
Ca(r)veat Emptor: Crowdsourcing Data to Challenge the Testimony of In-Car Technology
This Article addresses the situation in which a car acts as a witness against its human driver in a court of law. This possibility has become a reality due to technology embedded in modern-day vehicles that captures data prior to a crash event. The authors contend that it is becoming increasingly difficult for drivers to defend themselves in a meaningful way against the testimony of cars and suggest that crowdsourcing data could be a viable option for assessing the trustworthiness of such evidence. The Article further explores whether crowdsourced data could be used as an additional source of information in the fact-finding process and if such data could provide a counterbalance to the prevailing tendency to fault human drivers rather than their vehicles or the manufactures of their vehicles. The practical importance of this issue in the age of driving automation is highlighted, and lawyers, judges, and lawmakers are urged to remain open-minded regarding the use of this new strategy
Efficient Crowd Exploration of Large Networks: The Case of Causal Attribution
Accurately and efficiently crowdsourcing complex, open-ended tasks can be
difficult, as crowd participants tend to favor short, repetitive "microtasks".
We study the crowdsourcing of large networks where the crowd provides the
network topology via microtasks. Crowds can explore many types of social and
information networks, but we focus on the network of causal attributions, an
important network that signifies cause-and-effect relationships. We conduct
experiments on Amazon Mechanical Turk (AMT) testing how workers propose and
validate individual causal relationships and introduce a method for independent
crowd workers to explore large networks. The core of the method, Iterative
Pathway Refinement, is a theoretically-principled mechanism for efficient
exploration via microtasks. We evaluate the method using synthetic networks and
apply it on AMT to extract a large-scale causal attribution network, then
investigate the structure of this network as well as the activity patterns and
efficiency of the workers who constructed this network. Worker interactions
reveal important characteristics of causal perception and the network data they
generate can improve our understanding of causality and causal inference.Comment: 25 pages, 14 figures, in CSCW'1
Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection
In an effort to assist factcheckers in the process of factchecking, we tackle
the claim detection task, one of the necessary stages prior to determining the
veracity of a claim. It consists of identifying the set of sentences, out of a
long text, deemed capable of being factchecked. This paper is a collaborative
work between Full Fact, an independent factchecking charity, and academic
partners. Leveraging the expertise of professional factcheckers, we develop an
annotation schema and a benchmark for automated claim detection that is more
consistent across time, topics and annotators than previous approaches. Our
annotation schema has been used to crowdsource the annotation of a dataset with
sentences from UK political TV shows. We introduce an approach based on
universal sentence representations to perform the classification, achieving an
F1 score of 0.83, with over 5% relative improvement over the state-of-the-art
methods ClaimBuster and ClaimRank. The system was deployed in production and
received positive user feedback.Comment: Accepted for ACM Digital Threats: Research and Practice (DTRAP
Quality of Information in Mobile Crowdsensing: Survey and Research Challenges
Smartphones have become the most pervasive devices in people's lives, and are
clearly transforming the way we live and perceive technology. Today's
smartphones benefit from almost ubiquitous Internet connectivity and come
equipped with a plethora of inexpensive yet powerful embedded sensors, such as
accelerometer, gyroscope, microphone, and camera. This unique combination has
enabled revolutionary applications based on the mobile crowdsensing paradigm,
such as real-time road traffic monitoring, air and noise pollution, crime
control, and wildlife monitoring, just to name a few. Differently from prior
sensing paradigms, humans are now the primary actors of the sensing process,
since they become fundamental in retrieving reliable and up-to-date information
about the event being monitored. As humans may behave unreliably or
maliciously, assessing and guaranteeing Quality of Information (QoI) becomes
more important than ever. In this paper, we provide a new framework for
defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the
current state-of-the-art on the topic. We also outline novel research
challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN
- …