Search CORE

1,778 research outputs found

e-Counterfeit: a mobile-server platform for document counterfeit detection

Author: Berenguel Albert
Cañero Cristina
Lladós Josep
Terrades Oriol Ramos
Publication venue
Publication date: 21/08/2017
Field of study

This paper presents a novel application to detect counterfeit identity documents forged by a scan-printing operation. Texture analysis approaches are proposed to extract validation features from security background that is usually printed in documents as IDs or banknotes. The main contribution of this work is the end-to-end mobile-server architecture, which provides a service for non-expert users and therefore can be used in several scenarios. The system also provides a crowdsourcing mode so labeled images can be gathered, generating databases for incremental training of the algorithms.Comment: 6 pages, 5 figure

arXiv.org e-Print Archive

The Role of Conversation Context for Sarcasm Detection in Online Interactions

Author: Fabbri Alexander Richard
Ghosh Debanjan
Muresan Smaranda
Publication venue
Publication date: 18/07/2017
Field of study

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker's sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.Comment: SIGDial 201

arXiv.org e-Print Archive

Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

Author: Becker Hila
Flora
Ji Heng
Khandpur Rupinder P.
Lee Wenke
Li Frank
Liu Yang
Modi A.
Muthiah Sathappan
Ovelgonne Michael
Rehurek Radim
Sabottke Carl
Soska Kyle
Tanev Hristo
Weller-Fahy David J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2017
Field of study

Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201

arXiv.org e-Print Archive

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

Author: A Bruns
AM Azmi
BS Wasike
D Bodoff
D Elsweiler
Hind Almerekhi
J Benhardus
JL Fleiss
JR Landis
K Darwish
M Efron
M Rowe
M Sanderson
Maram Hasanain
Mucahid Kutlu
Reem Suwaileh
RL Brennan
Tamer Elsayed
W Magdy
Zhang Y
Publication venue
Publication date: 21/08/2017
Field of study

This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

arXiv.org e-Print Archive

Controversy and Sentiment in Online News

Author: Castillo Carlos
Diakopoulos Nicholas
Mejova Yelena
Zhang Amy X.
Publication venue
Publication date: 29/09/2014
Field of study

How do news sources tackle controversial issues? In this work, we take a data-driven approach to understand how controversy interplays with emotional expression and biased language in the news. We begin by introducing a new dataset of controversial and non-controversial terms collected using crowdsourcing. Then, focusing on 15 major U.S. news outlets, we compare millions of articles discussing controversial and non-controversial issues over a span of 7 months. We find that in general, when it comes to controversial issues, the use of negative affect and biased language is prevalent, while the use of strong emotion is tempered. We also observe many differences across news sources. Using these findings, we show that we can indicate to what extent an issue is controversial, by comparing it with other issues in terms of how they are portrayed across different media.Comment: Computation+Journalism Symposium 201

arXiv.org e-Print Archive

Crowdsourcing and Cloudsourcing CCTV Surveillance

Author: A C Davies
B Kanefsky
Burkhard Schafer
C Norris
D J Guth
G Davies
H Ickler
H Kai
J H Velastin
L Cohen
L Davis
M Clarke
M L Gill
M-T Tinnefeld
N Fyfe
N Groombridge
N South
P G Kelley
P Korshunov
P Legrand
R Surette
R Surette
T Monahan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Ca(r)veat Emptor: Crowdsourcing Data to Challenge the Testimony of In-Car Technology

Author: Di Xuan
Gless Sabine
Silverman Emily
Publication venue: American Bar Association
Publication date: 01/01/2022
Field of study

This Article addresses the situation in which a car acts as a witness against its human driver in a court of law. This possibility has become a reality due to technology embedded in modern-day vehicles that captures data prior to a crash event. The authors contend that it is becoming increasingly difficult for drivers to defend themselves in a meaningful way against the testimony of cars and suggest that crowdsourcing data could be a viable option for assessing the trustworthiness of such evidence. The Article further explores whether crowdsourced data could be used as an additional source of information in the fact-finding process and if such data could provide a counterbalance to the prevailing tendency to fault human drivers rather than their vehicles or the manufactures of their vehicles. The practical importance of this issue in the age of driving automation is highlighted, and lawyers, judges, and lawmakers are urged to remain open-minded regarding the use of this new strategy

edoc

Efficient Crowd Exploration of Large Networks: The Case of Causal Attribution

Author: Bagrow James P.
Berenberg Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/10/2018
Field of study

Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive "microtasks". We study the crowdsourcing of large networks where the crowd provides the network topology via microtasks. Crowds can explore many types of social and information networks, but we focus on the network of causal attributions, an important network that signifies cause-and-effect relationships. We conduct experiments on Amazon Mechanical Turk (AMT) testing how workers propose and validate individual causal relationships and introduce a method for independent crowd workers to explore large networks. The core of the method, Iterative Pathway Refinement, is a theoretically-principled mechanism for efficient exploration via microtasks. We evaluate the method using synthetic networks and apply it on AMT to extract a large-scale causal attribution network, then investigate the structure of this network as well as the activity patterns and efficiency of the workers who constructed this network. Worker interactions reveal important characteristics of causal perception and the network data they generate can improve our understanding of causality and causal inference.Comment: 25 pages, 14 figures, in CSCW'1

arXiv.org e-Print Archive

Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

Author: Babakar Mevan
Konstantinovskiy Lev
Price Oliver
Zubiaga Arkaitz
Publication venue
Publication date: 17/08/2020
Field of study

In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This paper is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Leveraging the expertise of professional factcheckers, we develop an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics and annotators than previous approaches. Our annotation schema has been used to crowdsource the annotation of a dataset with sentences from UK political TV shows. We introduce an approach based on universal sentence representations to perform the classification, achieving an F1 score of 0.83, with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank. The system was deployed in production and received positive user feedback.Comment: Accepted for ACM Digital Threats: Research and Practice (DTRAP

arXiv.org e-Print Archive

Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

Author: Bhattacharjee Shameek
Das Sajal
Ghosh Nirnay
Melodia Tommaso
Restuccia Francesco
Publication venue
Publication date: 06/09/2017
Field of study

Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

arXiv.org e-Print Archive

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine