5 research outputs found
Using Deep Networks and Transfer Learning to Address Disinformation
We apply an ensemble pipeline composed of a character-level convolutional
neural network (CNN) and a long short-term memory (LSTM) as a general tool for
addressing a range of disinformation problems. We also demonstrate the ability
to use this architecture to transfer knowledge from labeled data in one domain
to related (supervised and unsupervised) tasks. Character-level neural networks
and transfer learning are particularly valuable tools in the disinformation
space because of the messy nature of social media, lack of labeled data, and
the multi-channel tactics of influence campaigns. We demonstrate their
effectiveness in several tasks relevant for detecting disinformation: spam
emails, review bombing, political sentiment, and conversation clustering.Comment: AI for Social Good Workshop at the International Conference on
Machine Learning, Long Beach, United States (2019
Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks
A character-level convolutional neural network (CNN) motivated by
applications in "automated machine learning" (AutoML) is proposed to
semantically classify columns in tabular data. Simulated data containing a set
of base classes is first used to learn an initial set of weights. Hand-labeled
data from the CKAN repository is then used in a transfer-learning paradigm to
adapt the initial weights to a more sophisticated representation of the problem
(e.g., including more classes). In doing so, realistic data imperfections are
learned and the set of classes handled can be expanded from the base set with
reduced labeled data and computing power requirements. Results show the
effectiveness and flexibility of this approach in three diverse domains:
semantic classification of tabular data, age prediction from social media
posts, and email spam classification. In addition to providing further evidence
of the effectiveness of transfer learning in natural language processing (NLP),
our experiments suggest that analyzing the semantic structure of language at
the character level without additional metadata---i.e., network structure,
headers, etc.---can produce competitive accuracy for type classification, spam
classification, and social media age prediction. We present our open-source
toolkit SIMON, an acronym for Semantic Inference for the Modeling of
ONtologies, which implements this approach in a user-friendly and
scalable/parallelizable fashion
Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization
Contextualized embeddings use unsupervised language model pretraining to
compute word representations depending on their context. This is intuitively
useful for generalization, especially in Named-Entity Recognition where it is
crucial to detect mentions never seen during training. However, standard
English benchmarks overestimate the importance of lexical over contextual
features because of an unrealistic lexical overlap between train and test
mentions. In this paper, we perform an empirical analysis of the generalization
capabilities of state-of-the-art contextualized embeddings by separating
mentions by novelty and with out-of-domain evaluation. We show that they are
particularly beneficial for unseen mentions detection, especially
out-of-domain. For models trained on CoNLL03, language model contextualization
leads to a +1.2% maximal relative micro-F1 score increase in-domain against
+13% out-of-domain on the WNUT datase
Label-Agnostic Sequence Labeling by Copying Nearest Neighbors
Retrieve-and-edit based approaches to structured prediction, where structures
associated with retrieved neighbors are edited to form new structures, have
recently attracted increased interest. However, much recent work merely
conditions on retrieved structures (e.g., in a sequence-to-sequence framework),
rather than explicitly manipulating them. We show we can perform accurate
sequence labeling by explicitly (and only) copying labels from retrieved
neighbors. Moreover, because this copying is label-agnostic, we can achieve
impressive performance in zero-shot sequence-labeling tasks. We additionally
consider a dynamic programming approach to sequence labeling in the presence of
retrieved neighbors, which allows for controlling the number of distinct
(copied) segments used to form a prediction, and leads to both more
interpretable and accurate predictions.Comment: ACL 201
Named Entity Recognition without Labelled Data: A Weak Supervision Approach
Named Entity Recognition (NER) performance often degrades rapidly when
applied to target domains that differ from the texts observed during training.
When in-domain labelled data is available, transfer learning techniques can be
used to adapt existing NER models to the target domain. But what should one do
when there is no hand-labelled data for the target domain? This paper presents
a simple but powerful approach to learn NER models in the absence of labelled
data through weak supervision. The approach relies on a broad spectrum of
labelling functions to automatically annotate texts from the target domain.
These annotations are then merged together using a hidden Markov model which
captures the varying accuracies and confusions of the labelling functions. A
sequence labelling model can finally be trained on the basis of this unified
annotation. We evaluate the approach on two English datasets (CoNLL 2003 and
news articles from Reuters and Bloomberg) and demonstrate an improvement of
about 7 percentage points in entity-level scores compared to an
out-of-domain neural NER model.Comment: Accepted to ACL 2020 (long paper