4,842 research outputs found
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
Finding any Waldo: zero-shot invariant and efficient visual search
Searching for a target object in a cluttered scene constitutes a fundamental
challenge in daily vision. Visual search must be selective enough to
discriminate the target from distractors, invariant to changes in the
appearance of the target, efficient to avoid exhaustive exploration of the
image, and must generalize to locate novel target objects with zero-shot
training. Previous work has focused on searching for perfect matches of a
target after extensive category-specific training. Here we show for the first
time that humans can efficiently and invariantly search for natural objects in
complex scenes. To gain insight into the mechanisms that guide visual search,
we propose a biologically inspired computational model that can locate targets
without exhaustive sampling and generalize to novel objects. The model provides
an approximation to the mechanisms integrating bottom-up and top-down signals
during search in natural scenes.Comment: Number of figures: 6 Number of supplementary figures: 1
Less than Few: Self-Shot Video Instance Segmentation
The goal of this paper is to bypass the need for labelled examples in
few-shot video understanding at run time. While proven effective, in many
practical video settings even labelling a few examples appears unrealistic.
This is especially true as the level of details in spatio-temporal video
understanding and with it, the complexity of annotations continues to increase.
Rather than performing few-shot learning with a human oracle to provide a few
densely labelled support videos, we propose to automatically learn to find
appropriate support videos given a query. We call this self-shot learning and
we outline a simple self-supervised learning method to generate an embedding
space well-suited for unsupervised retrieval of relevant samples. To showcase
this novel setting, we tackle, for the first time, video instance segmentation
in a self-shot (and few-shot) setting, where the goal is to segment instances
at the pixel-level across the spatial and temporal domains. We provide strong
baseline performances that utilize a novel transformer-based model and show
that self-shot learning can even surpass few-shot and can be positively
combined for further performance gains. Experiments on new benchmarks show that
our approach achieves strong performance, is competitive to oracle support in
some settings, scales to large unlabelled video collections, and can be
combined in a semi-supervised setting.Comment: 25 pages, 5 figures, 13 table
CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation
We introduce CORE, a dataset for few-shot relation classification (RC)
focused on company relations and business entities. CORE includes 4,708
instances of 12 relation types with corresponding textual evidence extracted
from company Wikipedia pages. Company names and business entities pose a
challenge for few-shot RC models due to the rich and diverse information
associated with them. For example, a company name may represent the legal
entity, products, people, or business divisions depending on the context.
Therefore, deriving the relation type between entities is highly dependent on
textual context. To evaluate the performance of state-of-the-art RC models on
the CORE dataset, we conduct experiments in the few-shot domain adaptation
setting. Our results reveal substantial performance gaps, confirming that
models trained on different domains struggle to adapt to CORE. Interestingly,
we find that models trained on CORE showcase improved out-of-domain
performance, which highlights the importance of high-quality data for robust
domain adaptation. Specifically, the information richness embedded in business
entities allows models to focus on contextual nuances, reducing their reliance
on superficial clues such as relation-specific verbs. In addition to the
dataset, we provide relevant code snippets to facilitate reproducibility and
encourage further research in the field.Comment: Accepted to EMNLP 2023 main conferenc
Few-Shot Transformation of Common Actions into Time and Space
This paper introduces the task of few-shot common action localization in time
and space. Given a few trimmed support videos containing the same but unknown
action, we strive for spatio-temporal localization of that action in a long
untrimmed query video. We do not require any class labels, interval bounds, or
bounding boxes. To address this challenging task, we introduce a novel few-shot
transformer architecture with a dedicated encoder-decoder structure optimized
for joint commonality learning and localization prediction, without the need
for proposals. Experiments on our reorganizations of the AVA and UCF101-24
datasets show the effectiveness of our approach for few-shot common action
localization, even when the support videos are noisy. Although we are not
specifically designed for common localization in time only, we also compare
favorably against the few-shot and one-shot state-of-the-art in this setting.
Lastly, we demonstrate that the few-shot transformer is easily extended to
common action localization per pixel
- …