3,394 research outputs found
A Retrospective Analysis of the Fake News Challenge Stance Detection Task
The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance
classification task as a crucial first step towards detecting fake news. To
date, there is no in-depth analysis paper to critically discuss FNC-1's
experimental setup, reproduce the results, and draw conclusions for
next-generation stance classification methods. In this paper, we provide such
an in-depth analysis for the three top-performing systems. We first find that
FNC-1's proposed evaluation metric favors the majority class, which can be
easily classified, and thus overestimates the true discriminative power of the
methods. Therefore, we propose a new F1-based metric yielding a changed system
ranking. Next, we compare the features and architectures used, which leads to a
novel feature-rich stacked LSTM model that performs on par with the best
systems, but is superior in predicting minority classes. To understand the
methods' ability to generalize, we derive a new dataset and perform both
in-domain and cross-domain experiments. Our qualitative and quantitative study
helps interpreting the original FNC-1 scores and understand which features help
improving performance and why. Our new dataset and all source code used during
the reproduction study are publicly available for future research
Automatic Detection of Vague Words and Sentences in Privacy Policies
Website privacy policies represent the single most important source of
information for users to gauge how their personal data are collected, used and
shared by companies. However, privacy policies are often vague and people
struggle to understand the content. Their opaqueness poses a significant
challenge to both users and policy regulators. In this paper, we seek to
identify vague content in privacy policies. We construct the first corpus of
human-annotated vague words and sentences and present empirical studies on
automatic vagueness detection. In particular, we investigate context-aware and
context-agnostic models for predicting vague words, and explore
auxiliary-classifier generative adversarial networks for characterizing
sentence vagueness. Our experimental results demonstrate the effectiveness of
proposed approaches. Finally, we provide suggestions for resolving vagueness
and improving the usability of privacy policies.Comment: 10 page
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces
We combine multi-task learning and semi-supervised learning by inducing a
joint embedding space between disparate label spaces and learning transfer
functions between label embeddings, enabling us to jointly leverage unlabelled
data and auxiliary, annotated datasets. We evaluate our approach on a variety
of sequence classification tasks with disparate label spaces. We outperform
strong single and multi-task baselines and achieve a new state-of-the-art for
topic-based sentiment analysis.Comment: To appear at NAACL 2018 (long
Unpaired Image Captioning via Scene Graph Alignments
Most of current image captioning models heavily rely on paired image-caption
datasets. However, getting large scale image-caption paired data is
labor-intensive and time-consuming. In this paper, we present a scene
graph-based approach for unpaired image captioning. Our framework comprises an
image scene graph generator, a sentence scene graph generator, a scene graph
encoder, and a sentence decoder. Specifically, we first train the scene graph
encoder and the sentence decoder on the text modality. To align the scene
graphs between images and sentences, we propose an unsupervised feature
alignment method that maps the scene graph features from the image to the
sentence modality. Experimental results show that our proposed model can
generate quite promising results without using any image-caption training
pairs, outperforming existing methods by a wide margin.Comment: Accepted in ICCV 201
- …