13,619 research outputs found
MetaICL: Learning to Learn In Context
We introduce MetaICL (Meta-training for In-Context Learning), a new
meta-training framework for few-shot learning where a pretrained language model
is tuned to do in-context learning on a large set of training tasks. This
meta-training enables the model to more effectively learn a new task in context
at test time, by simply conditioning on a few training examples with no
parameter updates or task-specific templates. We experiment on a large, diverse
collection of tasks consisting of 142 NLP datasets including classification,
question answering, natural language inference, paraphrase detection and more,
across seven different meta-training/target splits. MetaICL outperforms a range
of baselines including in-context learning without meta-training and multi-task
learning followed by zero-shot transfer. We find that the gains are
particularly significant for target tasks that have domain shifts from the
meta-training tasks, and that using a diverse set of the meta-training tasks is
key to improvements. We also show that MetaICL approaches (and sometimes beats)
the performance of models fully finetuned on the target task, and outperforms
much bigger models with nearly 8x parameters. Finally, we show that MetaICL is
complementary to human-written instructions, and the best performance can be
achieved by combining both approaches.Comment: 19 pages, 2 figures. Published as a conference paper at NAACL 2022
(long). Code available at https://github.com/facebookresearch/MetaIC
Improved Audio Scene Classification based on Label-Tree Embeddings and Convolutional Neural Networks
We present in this article an efficient approach for audio scene classification. We aim at learning representations for scene examples by exploring the structure of their class labels. A category taxonomy is automatically learned by collectively optimizing a tree-structured clustering of the given labels into multiple meta-classes. A scene recording is then transformed into a label tree embedding image. Elements of the image represent the likelihoods that the scene instance belongs to the meta-classes. We investigate classification with label tree embedding features learned from different low-level features as well as their fusion. We show that combination of multiple features is essential to obtain good performance. While averaging label-tree embedding images over time yields good performance, we argue that average pooling possesses an intrinsic shortcoming. We alternatively propose an improved classification scheme to bypass this limitation. We aim at automatically learning common templates that are useful for the classification task from these images using simple but tailored convolutional neural networks. The trained networks are then employed as a feature extractor that matches the learned templates across a label tree embedding image and produce the maximum matching scores as features for classification. Since audio scenes exhibit rich content, template learning and matching on low-level features would be inefficient. With label tree embedding features, we have quantized and reduced the low-level features into the likelihoods of the meta-classes on which the template learning and matching are efficient. We study both training convolutional neural networks on stacked label tree embedding images and multi-stream networks. Experimental results on the DCASE2016 and LITIS Rouen datasets demonstrate the efficiency of the proposed methods
A fine-grained approach to scene text script identification
This paper focuses on the problem of script identification in unconstrained
scenarios. Script identification is an important prerequisite to recognition,
and an indispensable condition for automatic text understanding systems
designed for multi-language environments. Although widely studied for document
images and handwritten documents, it remains an almost unexplored territory for
scene text images.
We detail a novel method for script identification in natural images that
combines convolutional features and the Naive-Bayes Nearest Neighbor
classifier. The proposed framework efficiently exploits the discriminative
power of small stroke-parts, in a fine-grained classification framework.
In addition, we propose a new public benchmark dataset for the evaluation of
joint text detection and script identification in natural scenes. Experiments
done in this new dataset demonstrate that the proposed method yields state of
the art results, while it generalizes well to different datasets and variable
number of scripts. The evidence provided shows that multi-lingual scene text
recognition in the wild is a viable proposition. Source code of the proposed
method is made available online
- …