1,387 research outputs found
Fake it till you make it: Learning transferable representations from synthetic ImageNet clones
Recent image generation models such as Stable Diffusion have exhibited an
impressive ability to generate fairly realistic images starting from a simple
text prompt. Could such models render real images obsolete for training image
prediction models? In this paper, we answer part of this provocative question
by investigating the need for real images when training models for ImageNet
classification. Provided only with the class names that have been used to build
the dataset, we explore the ability of Stable Diffusion to generate synthetic
clones of ImageNet and measure how useful these are for training classification
models from scratch. We show that with minimal and class-agnostic prompt
engineering, ImageNet clones are able to close a large part of the gap between
models produced by synthetic images and models trained with real images, for
the several standard classification benchmarks that we consider in this study.
More importantly, we show that models trained on synthetic images exhibit
strong generalization properties and perform on par with models trained on real
data for transfer. Project page: https://europe.naverlabs.com/imagenet-sd/Comment: Accepted to CVPR 202
Fine-Grained Image Analysis with Deep Learning: A Survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem
in computer vision and pattern recognition, and underpins a diverse set of
real-world applications. The task of FGIA targets analyzing visual objects from
subordinate categories, e.g., species of birds or models of cars. The small
inter-class and large intra-class variation inherent to fine-grained image
analysis makes it a challenging problem. Capitalizing on advances in deep
learning, in recent years we have witnessed remarkable progress in deep
learning powered FGIA. In this paper we present a systematic survey of these
advances, where we attempt to re-define and broaden the field of FGIA by
consolidating two fundamental fine-grained research areas -- fine-grained image
recognition and fine-grained image retrieval. In addition, we also review other
key issues of FGIA, such as publicly available benchmark datasets and related
domain-specific applications. We conclude by highlighting several research
directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Visual question answering requires a system to provide an accurate natural
language answer given an image and a natural language question. However, it is
widely recognized that previous generic VQA methods often exhibit a tendency to
memorize biases present in the training data rather than learning proper
behaviors, such as grounding images before predicting answers. Therefore, these
methods usually achieve high in-distribution but poor out-of-distribution
performance. In recent years, various datasets and debiasing methods have been
proposed to evaluate and enhance the VQA robustness, respectively. This paper
provides the first comprehensive survey focused on this emerging fashion.
Specifically, we first provide an overview of the development process of
datasets from in-distribution and out-of-distribution perspectives. Then, we
examine the evaluation metrics employed by these datasets. Thirdly, we propose
a typology that presents the development process, similarities and differences,
robustness comparison, and technical features of existing debiasing methods.
Furthermore, we analyze and discuss the robustness of representative
vision-and-language pre-training models on VQA. Finally, through a thorough
review of the available literature and experimental analysis, we discuss the
key areas for future research from various viewpoints.Comment: IEEE TPAMI (Under Review
Human-machine knowledge hybrid augmentation method for surface defect detection based few-data learning
Visual-based defect detection is a crucial but challenging task in industrial
quality control. Most mainstream methods rely on large amounts of existing or
related domain data as auxiliary information. However, in actual industrial
production, there are often multi-batch, low-volume manufacturing scenarios
with rapidly changing task demands, making it difficult to obtain sufficient
and diverse defect data. This paper proposes a parallel solution that uses a
human-machine knowledge hybrid augmentation method to help the model extract
unknown important features. Specifically, by incorporating experts' knowledge
of abnormality to create data with rich features, positions, sizes, and
backgrounds, we can quickly accumulate an amount of data from scratch and
provide it to the model as prior knowledge for few-data learning. The proposed
method was evaluated on the magnetic tile dataset and achieved F1-scores of
60.73%, 70.82%, 77.09%, and 82.81% when using 2, 5, 10, and 15 training images,
respectively. Compared to the traditional augmentation method's F1-score of
64.59%, the proposed method achieved an 18.22% increase in the best result,
demonstrating its feasibility and effectiveness in few-data industrial defect
detection.Comment: 24 pages, 15 figure
- …