2,143 research outputs found
Visual Transfer Learning in the Absence of the Source Data
Image recognition has become one of the most popular topics in machine learning. With the development of Deep Convolutional Neural Networks (CNN) and the help of the large scale labeled image database such as ImageNet, modern image recognition models can achieve competitive performance compared to human annotation in some general image recognition tasks. Many IT companies have adopted it to improve their visual related tasks. However, training these large scale deep neural networks requires thousands or even millions of labeled images, which is an obstacle when applying it to a specific visual task with limited training data. Visual transfer learning is proposed to solve this problem. Visual transfer learning aims at transferring the knowledge from a source visual task to a target visual task. Typically, the target task is related to the source task, and the training data in the target task is relatively small. In visual transfer learning, the majority of existing methods assume that the source data is freely available and use the source data to measure the discrepancy between the source and target task to help the transfer process. However, in many real applications, source data are often a subject of legal, technical and contractual constraints between data owners and data customers. Beyond privacy and disclosure obligations, customers are often reluctant to share their data. When operating customer care, collected data may include information on recent technical problems which is a highly sensitive topic that companies are not willing to share. This scenario is often called Hypothesis Transfer Learning (HTL) where the source data is absent. Therefore, these previous methods cannot be applied to many real visual transfer learning problems. In this thesis, we investigate the visual transfer learning problem under HTL setting. Instead of using the source data to measure the discrepancy, we use the source model as the proxy to transfer the knowledge from the source task to the target task. Compared to the source data, the well-trained source model is usually freely accessible in many tasks and contains equivalent source knowledge as well. Specifically, in this thesis, we investigate the visual transfer learning in two scenarios: domain adaptation and learning new categories. In contrast to the previous methods in HTL, our methods can both leverage knowledge from more types of source models and achieve better transfer performance. In chapter 3, we investigate the visual domain adaptation problem under the setting of Hypothesis Transfer Learning. We propose Effective Multiclass Transfer Learning (EMTLe) that can effectively transfer the knowledge when the size of the target set is small. Specifically, EMTLe can effectively transfer the knowledge using the outputs of the source models as the auxiliary bias to adjust the prediction in the target task. Experiment results show that EMTLe can outperform other baselines under the setting of HTL. In chapter 4, we investigate the semi-supervised domain adaptation scenario under the setting of HTL and propose our framework Generalized Distillation Semi-supervised Domain Adaptation (GDSDA). Specifically, we show that GDSDA can effectively transfer the knowledge using the unlabeled data. We also demonstrate that the imitation parameter, the hyperparameter in GDSDA that balances the knowledge from source and target task, is important to the transfer performance. Then we propose GDSDA-SVM which uses SVMs as the base classifier in GDSDA. We show that GDSDA-SVM can determine the imitation parameter in GDSDA autonomously. Compared to previous methods, whose imitation parameter can only be determined by either brutal force search or background knowledge, GDSDA-SVM is more effective in real applications. In chapter 5, we investigate the problem of fine-tuning the deep CNN to learn new food categories using the large ImageNet database as our source. Without accessing to the source data, i.e. the ImageNet dataset, we show that by fine-tuning the parameters of the source model with our target food dataset, we can achieve better performance compared to those previous methods. To conclude, the main contribution of is that we investigate the visual transfer learning problem under the HTL setting. We propose several methods to transfer the knowledge from the source task in supervised and semi-supervised learning scenarios. Extensive experiments results show that without accessing to any source data, our methods can outperform previous work
Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion
In recent years, dynamic vision sensors (DVS), also known as event-based
cameras or neuromorphic sensors, have seen increased use due to various
advantages over conventional frame-based cameras. Using principles inspired by
the retina, its high temporal resolution overcomes motion blurring, its high
dynamic range overcomes extreme illumination conditions and its low power
consumption makes it ideal for embedded systems on platforms such as drones and
self-driving cars. However, event-based data sets are scarce and labels are
even rarer for tasks such as object detection. We transferred discriminative
knowledge from a state-of-the-art frame-based convolutional neural network
(CNN) to the event-based modality via intermediate pseudo-labels, which are
used as targets for supervised learning. We show, for the first time,
event-based car detection under ego-motion in a real environment at 100 frames
per second with a test average precision of 40.3% relative to our annotated
ground truth. The event-based car detector handles motion blur and poor
illumination conditions despite not explicitly trained to do so, and even
complements frame-based CNN detectors, suggesting that it has learnt
generalized visual representations
DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
Domain adaptive semantic segmentation methods commonly utilize stage-wise
training, consisting of a warm-up and a self-training stage. However, this
popular approach still faces several challenges in each stage: for warm-up, the
widely adopted adversarial training often results in limited performance gain,
due to blind feature alignment; for self-training, finding proper categorical
thresholds is very tricky. To alleviate these issues, we first propose to
replace the adversarial training in the warm-up stage by a novel symmetric
knowledge distillation module that only accesses the source domain data and
makes the model domain generalizable. Surprisingly, this domain generalizable
warm-up model brings substantial performance improvement, which can be further
amplified via our proposed cross-domain mixture data augmentation technique.
Then, for the self-training stage, we propose a threshold-free dynamic
pseudo-label selection mechanism to ease the aforementioned threshold problem
and make the model better adapted to the target domain. Extensive experiments
demonstrate that our framework achieves remarkable and consistent improvements
compared to the prior arts on popular benchmarks. Codes and models are
available at https://github.com/fy-vision/DiGAComment: CVPR202
Teacher-Student Architecture for Knowledge Distillation: A Survey
Although Deep neural networks (DNNs) have shown a strong capacity to solve
large-scale problems in many areas, such DNNs are hard to be deployed in
real-world systems due to their voluminous parameters. To tackle this issue,
Teacher-Student architectures were proposed, where simple student networks with
a few parameters can achieve comparable performance to deep teacher networks
with many parameters. Recently, Teacher-Student architectures have been
effectively and widely embraced on various knowledge distillation (KD)
objectives, including knowledge compression, knowledge expansion, knowledge
adaptation, and knowledge enhancement. With the help of Teacher-Student
architectures, current studies are able to achieve multiple distillation
objectives through lightweight and generalized student networks. Different from
existing KD surveys that primarily focus on knowledge compression, this survey
first explores Teacher-Student architectures across multiple distillation
objectives. This survey presents an introduction to various knowledge
representations and their corresponding optimization objectives. Additionally,
we provide a systematic overview of Teacher-Student architectures with
representative learning algorithms and effective distillation schemes. This
survey also summarizes recent applications of Teacher-Student architectures
across multiple purposes, including classification, recognition, generation,
ranking, and regression. Lastly, potential research directions in KD are
investigated, focusing on architecture design, knowledge quality, and
theoretical studies of regression-based learning, respectively. Through this
comprehensive survey, industry practitioners and the academic community can
gain valuable insights and guidelines for effectively designing, learning, and
applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with
arXiv:2210.1733
FABLE : Fabric Anomaly Detection Automation Process
Unsupervised anomaly in industry has been a concerning topic and a stepping
stone for high performance industrial automation process. The vast majority of
industry-oriented methods focus on learning from good samples to detect anomaly
notwithstanding some specific industrial scenario requiring even less specific
training and therefore a generalization for anomaly detection. The obvious use
case is the fabric anomaly detection, where we have to deal with a really wide
range of colors and types of textile and a stoppage of the production line for
training could not be considered. In this paper, we propose an automation
process for industrial fabric texture defect detection with a
specificity-learning process during the domain-generalized anomaly detection.
Combining the ability to generalize and the learning process offer a fast and
precise anomaly detection and segmentation. The main contributions of this
paper are the following: A domain-generalization texture anomaly detection
method achieving the state-of-the-art performances, a fast specific training on
good samples extracted by the proposed method, a self-evaluation method based
on custom defect creation and an automatic detection of already seen fabric to
prevent re-training.Comment: 7th International Conference on Control, Automation and Diagnosis
(ICCAD'23), 6 page
Towards Efficient Task-Driven Model Reprogramming with Foundation Models
Vision foundation models exhibit impressive power, benefiting from the
extremely large model capacity and broad training data. However, in practice,
downstream scenarios may only support a small model due to the limited
computational resources or efficiency considerations. Moreover, the data used
for pretraining foundation models are usually invisible and very different from
the target data of downstream tasks. This brings a critical challenge for the
real-world application of foundation models: one has to transfer the knowledge
of a foundation model to the downstream task that has a quite different
architecture with only downstream target data. Existing transfer learning or
knowledge distillation methods depend on either the same model structure or
finetuning of the foundation model. Thus, naively introducing these methods can
be either infeasible or very inefficient. To address this, we propose a
Task-Driven Model Reprogramming (TDMR) framework. Specifically, we reprogram
the foundation model to project the knowledge into a proxy space, which
alleviates the adverse effect of task mismatch and domain inconsistency. Then,
we reprogram the target model via progressive distillation from the proxy space
to efficiently learn the knowledge from the reprogrammed foundation model. TDMR
is compatible with different pre-trained model types (CNN, transformer or their
mix) and limited target data, and promotes the wide applications of vision
foundation models to downstream tasks in a cost-effective manner. Extensive
experiments on different downstream classification tasks and target model
structures demonstrate the effectiveness of our methods with both CNNs and
transformer foundation models
- …