459 research outputs found
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
A lot of the recent success in natural language processing (NLP) has been
driven by distributed vector representations of words trained on large amounts
of text in an unsupervised manner. These representations are typically used as
general purpose features for words across a range of NLP problems. However,
extending this success to learning representations of sequences of words, such
as sentences, remains an open problem. Recent work has explored unsupervised as
well as supervised learning techniques with different training objectives to
learn general purpose fixed-length sentence representations. In this work, we
present a simple, effective multi-task learning framework for sentence
representations that combines the inductive biases of diverse training
objectives in a single model. We train this model on several data sources with
multiple training objectives on over 100 million sentences. Extensive
experiments demonstrate that sharing a single recurrent sentence encoder across
weakly related tasks leads to consistent improvements over previous methods. We
present substantial improvements in the context of transfer learning and
low-resource settings using our learned general-purpose representations.Comment: Accepted at ICLR 201
Cross-Lingual Adaptation for Type Inference
Deep learning-based techniques have been widely applied to the program
analysis tasks, in fields such as type inference, fault localization, and code
summarization. Hitherto deep learning-based software engineering systems rely
thoroughly on supervised learning approaches, which require laborious manual
effort to collect and label a prohibitively large amount of data. However, most
Turing-complete imperative languages share similar control- and data-flow
structures, which make it possible to transfer knowledge learned from one
language to another. In this paper, we propose cross-lingual adaptation of
program analysis, which allows us to leverage prior knowledge learned from the
labeled dataset of one language and transfer it to the others. Specifically, we
implemented a cross-lingual adaptation framework, PLATO, to transfer a deep
learning-based type inference procedure across weakly typed languages, e.g.,
Python to JavaScript and vice versa. PLATO incorporates a novel joint graph
kernelized attention based on abstract syntax tree and control flow graph, and
applies anchor word augmentation across different languages. Besides, by
leveraging data from strongly typed languages, PLATO improves the perplexity of
the backbone cross-programming-language model and the performance of downstream
cross-lingual transfer for type inference. Experimental results illustrate that
our framework significantly improves the transferability over the baseline
method by a large margin
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces
We combine multi-task learning and semi-supervised learning by inducing a
joint embedding space between disparate label spaces and learning transfer
functions between label embeddings, enabling us to jointly leverage unlabelled
data and auxiliary, annotated datasets. We evaluate our approach on a variety
of sequence classification tasks with disparate label spaces. We outperform
strong single and multi-task baselines and achieve a new state-of-the-art for
topic-based sentiment analysis.Comment: To appear at NAACL 2018 (long
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
We present the results and main findings of SemEval-2020 Task 12 on
Multilingual Offensive Language Identification in Social Media (OffensEval
2020). The task involves three subtasks corresponding to the hierarchical
taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The
task featured five languages: English, Arabic, Danish, Greek, and Turkish for
Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020
was one of the most popular tasks at SemEval-2020 attracting a large number of
participants across all subtasks and also across all languages. A total of 528
teams signed up to participate in the task, 145 teams submitted systems during
the evaluation period, and 70 submitted system description papers.Comment: Proceedings of the International Workshop on Semantic Evaluation
(SemEval-2020
- …