259 research outputs found
Recommended from our members
Learning from Limited Labeled Data for Visual Recognition
Recent advances in computer vision are in part due to the widespread use of deep neural networks. However, training deep networks require enormous amounts of labeled data which can be a bottleneck. In this thesis, we propose several approaches to mitigate this in the context of modern deep networks and computer vision tasks.
While transfer learning is an effective strategy for natural image tasks where large labeled datasets such as ImageNet are available, it is less effective for distant domains such as medical images and 3D shapes. Chapter 2 focuses on transfer learning from natural image representations to other modalities. In many cases, cross-modal data can be generated using computer graphics techniques. By forcing the agreement of predictions across modalities, we show that the models are more robust to image degradation, such as lower resolution, grayscale, or line drawings instead of color images in high-resolution. Similarly, we show that 3D shape classifiers learned from multi-view images can be transferred to the models of voxel or point cloud representations.
Another line of work has focused on techniques for few-shot learning. In particular, meta-learning approaches explicitly aim to generalize representations by emphasizing transferability to novel tasks. In Chapter 3, we analyze how to improve these techniques by exploiting unlabeled data from related tasks. We show that combining unsupervised objectives with meta-learning objectives can boost the performance of novel tasks. However, we find that small amounts of domain-specific data can be more beneficial than large amounts of generic data.
While transfer learning, unsupervised learning, and few-shot learning have been studied in isolation, in practice, one often finds that transfer learning from large labeled datasets is more effective than others. This is partly due to a lack of evaluation on benchmarks that contains challenges such as class imbalance and domain mismatch. In Chapter 4, we explore the role of expert models in the context of semi-supervised learning on a realistic benchmark. Unlike existing semi-supervised benchmarks, our dataset is designed to expose some of the challenges encountered in a realistic setting, such as the fine-grained similarity between classes, significant class imbalance, and domain mismatch between the labeled and unlabeled data. We show that current semi-supervised methods are negatively affected by out-of-class data, and their performance pales compared to a transfer learning baseline. Last, we leverage the coarse labels from a large collection of images to improve semi-supervised learning. In Chapter 5, we show that incorporating hierarchical labels in the taxonomy improves state-of-the-art semi-supervised methods
A Survey of Self-supervised Learning from Multiple Perspectives: Algorithms, Applications and Future Trends
Deep supervised learning algorithms generally require large numbers of
labeled examples to achieve satisfactory performance. However, collecting and
labeling too many examples can be costly and time-consuming. As a subset of
unsupervised learning, self-supervised learning (SSL) aims to learn useful
features from unlabeled examples without any human-annotated labels. SSL has
recently attracted much attention and many related algorithms have been
developed. However, there are few comprehensive studies that explain the
connections and evolution of different SSL variants. In this paper, we provide
a review of various SSL methods from the perspectives of algorithms,
applications, three main trends, and open questions. First, the motivations of
most SSL algorithms are introduced in detail, and their commonalities and
differences are compared. Second, typical applications of SSL in domains such
as image processing and computer vision (CV), as well as natural language
processing (NLP), are discussed. Finally, the three main trends of SSL and the
open research questions are discussed. A collection of useful materials is
available at https://github.com/guijiejie/SSL
To Compress or Not to Compress -- Self-Supervised Learning and Information Theory: A Review
Deep neural networks have demonstrated remarkable performance in supervised
learning tasks but require large amounts of labeled data. Self-supervised
learning offers an alternative paradigm, enabling the model to learn from data
without explicit labels. Information theory has been instrumental in
understanding and optimizing deep neural networks. Specifically, the
information bottleneck principle has been applied to optimize the trade-off
between compression and relevant information preservation in supervised
settings. However, the optimal information objective in self-supervised
learning remains unclear. In this paper, we review various approaches to
self-supervised learning from an information-theoretic standpoint and present a
unified framework that formalizes the \textit{self-supervised
information-theoretic learning problem}. We integrate existing research into a
coherent framework, examine recent self-supervised methods, and identify
research opportunities and challenges. Moreover, we discuss empirical
measurement of information-theoretic quantities and their estimators. This
paper offers a comprehensive review of the intersection between information
theory, self-supervised learning, and deep neural networks
Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
A hallmark of the deep learning era for computer vision is the successful use
of large-scale labeled datasets to train feature representations for tasks
ranging from object recognition and semantic segmentation to optical flow
estimation and novel view synthesis of 3D scenes. In this work, we aim to learn
dense discriminative object representations for low-shot category recognition
without requiring any category labels. To this end, we propose Deep Object
Patch Encodings (DOPE), which can be trained from multiple views of object
instances without any category or semantic object part labels. To train DOPE,
we assume access to sparse depths, foreground masks and known cameras, to
obtain pixel-level correspondences between views of an object, and use this to
formulate a self-supervised learning task to learn discriminative object
patches. We find that DOPE can directly be used for low-shot classification of
novel categories using local-part matching, and is competitive with and
outperforms supervised and self-supervised learning baselines. Code and data
available at https://github.com/rehg-lab/dope_selfsup.Comment: Accepted at NeurIPS 2022. Code and data available at
https://github.com/rehg-lab/dope_selfsu
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
- …