Search CORE

5 research outputs found

Using Deep Networks and Transfer Learning to Address Disinformation

Author: Azunre Paul
Corcoran Craig
Dhamani Numa
Gleason Jeffrey L.
Honke Garrett
Kramer Steve
Morgan Jonathon
Publication venue
Publication date: 24/05/2019
Field of study

We apply an ensemble pipeline composed of a character-level convolutional neural network (CNN) and a long short-term memory (LSTM) as a general tool for addressing a range of disinformation problems. We also demonstrate the ability to use this architecture to transfer knowledge from labeled data in one domain to related (supervised and unsupervised) tasks. Character-level neural networks and transfer learning are particularly valuable tools in the disinformation space because of the messy nature of social media, lack of labeled data, and the multi-channel tactics of influence campaigns. We demonstrate their effectiveness in several tasks relevant for detecting disinformation: spam emails, review bombing, political sentiment, and conversation clustering.Comment: AI for Social Good Workshop at the International Conference on Machine Learning, Long Beach, United States (2019

arXiv.org e-Print Archive

Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

Author: Azunre Paul
Corcoran Craig
Dhamani Numa
Gleason Jeffrey
Honke Garrett
Morgan Jonathon
Ruppel Rebecca
Sullivan David
Verma Sandeep
Publication venue
Publication date: 24/01/2019
Field of study

A character-level convolutional neural network (CNN) motivated by applications in "automated machine learning" (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata---i.e., network structure, headers, etc.---can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion

arXiv.org e-Print Archive

Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization

Author: Gallinari Patrick
Guigue Vincent
Taillé Bruno
Publication venue
Publication date: 22/01/2020
Field of study

Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context. This is intuitively useful for generalization, especially in Named-Entity Recognition where it is crucial to detect mentions never seen during training. However, standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions. In this paper, we perform an empirical analysis of the generalization capabilities of state-of-the-art contextualized embeddings by separating mentions by novelty and with out-of-domain evaluation. We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain. For models trained on CoNLL03, language model contextualization leads to a +1.2% maximal relative micro-F1 score increase in-domain against +13% out-of-domain on the WNUT datase

arXiv.org e-Print Archive

Label-Agnostic Sequence Labeling by Copying Nearest Neighbors

Author: Stratos Karl
Wiseman Sam
Publication venue
Publication date: 10/06/2019
Field of study

Retrieve-and-edit based approaches to structured prediction, where structures associated with retrieved neighbors are edited to form new structures, have recently attracted increased interest. However, much recent work merely conditions on retrieved structures (e.g., in a sequence-to-sequence framework), rather than explicitly manipulating them. We show we can perform accurate sequence labeling by explicitly (and only) copying labels from retrieved neighbors. Moreover, because this copying is label-agnostic, we can achieve impressive performance in zero-shot sequence-labeling tasks. We additionally consider a dynamic programming approach to sequence labeling in the presence of retrieved neighbors, which allows for controlling the number of distinct (copied) segments used to form a prediction, and leads to both more interpretable and accurate predictions.Comment: ACL 201

arXiv.org e-Print Archive

Named Entity Recognition without Labelled Data: A Weak Supervision Approach

Author: Barnes Jeremy
Hubin Aliaksandr
Lison Pierre
Touileb Samia
Publication venue
Publication date: 30/04/2020
Field of study

Named Entity Recognition (NER) performance often degrades rapidly when applied to target domains that differ from the texts observed during training. When in-domain labelled data is available, transfer learning techniques can be used to adapt existing NER models to the target domain. But what should one do when there is no hand-labelled data for the target domain? This paper presents a simple but powerful approach to learn NER models in the absence of labelled data through weak supervision. The approach relies on a broad spectrum of labelling functions to automatically annotate texts from the target domain. These annotations are then merged together using a hidden Markov model which captures the varying accuracies and confusions of the labelling functions. A sequence labelling model can finally be trained on the basis of this unified annotation. We evaluate the approach on two English datasets (CoNLL 2003 and news articles from Reuters and Bloomberg) and demonstrate an improvement of about 7 percentage points in entity-level

F_1

scores compared to an out-of-domain neural NER model.Comment: Accepted to ACL 2020 (long paper

arXiv.org e-Print Archive