10 research outputs found
Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods
Deep Metric Learning (DML) learns a non-linear semantic embedding from input
data that brings similar pairs together while keeps dissimilar data away from
each other. To this end, many different methods are proposed in the last decade
with promising results in various applications. The success of a DML algorithm
greatly depends on its loss function. However, no loss function is perfect, and
it deals only with some aspects of an optimal similarity embedding. Besides,
the generalizability of the DML on unseen categories during the test stage is
an important matter that is not considered by existing loss functions. To
address these challenges, we propose novel approaches to combine different
losses built on top of a shared deep feature extractor. The proposed ensemble
of losses enforces the deep model to extract features that are consistent with
all losses. Since the selected losses are diverse and each emphasizes different
aspects of an optimal semantic embedding, our effective combining methods yield
a considerable improvement over any individual loss and generalize well on
unseen categories. Here, there is no limitation in choosing loss functions, and
our methods can work with any set of existing ones. Besides, they can optimize
each loss function as well as its weight in an end-to-end paradigm with no need
to adjust any hyper-parameter. We evaluate our methods on some popular datasets
from the machine vision domain in conventional Zero-Shot-Learning (ZSL)
settings. The results are very encouraging and show that our methods outperform
all baseline losses by a large margin in all datasets.Comment: 27 pages, 12 figure
MergedNET: A simple approach for one-shot learning in siamese networks based on similarity layers
Classifiers trained on disjointed classes with few labelled data points are used in one-shot learning to identify visual concepts from other classes. Recently, Siamese networks and similarity layers have been used to solve the one-shot learning problem, achieving state-of-the-art performance on visual-character recognition datasets. Various techniques have been developed over the years to improve the performance of these networks on fine-grained image classification datasets. They focused primarily on improving the loss and activation functions, augmenting visual features, employing multiscale metric learning, and pre-training and fine-tuning the backbone network. We investigate similarity layers for one-shot learning tasks and propose two frameworks for combining these layers into a MergedNet network. On all four datasets used in our experiment, MergedNet outperformed the baselines based on classification accuracy, and it generalises to other datasets when trained on miniImageNet
๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ์ ์ํ ์ ๋ณด ํ์ฉ ๊ทน๋ํ ๊ธฐ๋ฒ ๊ฐ๋ฐ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ๊ธฐ๊ณํญ๊ณต๊ณตํ๋ถ, 2021.8. ์ค๋ณ๋.๊ธฐ๊ณ ์์คํ
์ ์๊ธฐ์น ์์ ๊ณ ์ฅ์ ๋ง์ ์ฐ์
๋ถ์ผ์์ ๋ง๋ํ ์ฌํ์ , ๊ฒฝ์ ์ ์์ค์ ์ผ๊ธฐํ ์ ์๋ค. ๊ฐ์์ค๋ฐ ๊ณ ์ฅ์ ๊ฐ์งํ๊ณ ์๋ฐฉํ์ฌ ๊ธฐ๊ณ ์์คํ
์ ์ ๋ขฐ์ฑ์ ๋์ด๊ธฐ ์ํด ๋ฐ์ดํฐ ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ ๊ธฐ์ ์ ๊ฐ๋ฐํ๊ธฐ ์ํ ์ฐ๊ตฌ๊ฐ ํ๋ฐํ๊ฒ ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ๊ณ ์ฅ ์ง๋จ ๊ธฐ์ ์ ๋ชฉํ๋ ๋์ ๊ธฐ๊ณ ์์คํ
์ ๊ณ ์ฅ ๋ฐ์์ ๊ฐ๋ฅํ ๋นจ๋ฆฌ ๊ฐ์งํ๊ณ ์ง๋จํ๋ ๊ฒ์ด๋ค. ์ต๊ทผ ํฉ์ฑ๊ณฑ ์ ๊ฒฝ๋ง ๊ธฐ๋ฒ์ ํฌํจํ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ ๊ธฐ์ ์ ์์จ์ ์ธ ํน์ฑ์ธ์(feature) ํ์ต์ด ๊ฐ๋ฅํ๊ณ ๋์ ์ง๋จ ์ฑ๋ฅ์ ์ป์ ์ ์๋ค๋ ์ฅ์ ์ด ์์ด ํ๋ฐํ ์ฐ๊ตฌ๋๊ณ ์๋ค.
๊ทธ๋ฌ๋ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ์ ๊ณ ์ฅ ์ง๋จ ๊ธฐ์ ์ ๊ฐ๋ฐํจ์ ์์ด ํด๊ฒฐํด์ผ ํ ๋ช ๊ฐ์ง ๋ฌธ์ ์ ๋ค์ด ์กด์ฌํ๋ค. ๋จผ์ , ์ ๊ฒฝ๋ง ๊ตฌ์กฐ๋ฅผ ๊น๊ฒ ์์์ผ๋ก์จ ํ๋ถํ ๊ณ์ธต์ ํน์ฑ์ธ์๋ค์ ๋ฐฐ์ธ ์ ์๊ณ , ์ด๋ฅผ ํตํด ํฅ์๋ ์ฑ๋ฅ์ ์ป์ ์ ์๋ค. ๊ทธ๋ฌ๋ ๊ธฐ์ธ๊ธฐ(gradient) ์ ๋ณด ํ๋ฆ์ ๋นํจ์จ์ฑ๊ณผ ๊ณผ์ ํฉ ๋ฌธ์ ๋ก ์ธํด ๋ชจ๋ธ์ด ๊น์ด์ง์๋ก ํ์ต์ด ์ด๋ ต๊ฒ ๋๋ค๋ ๋ฌธ์ ๊ฐ ์๋ค. ๋ค์์ผ๋ก, ๋์ ์ฑ๋ฅ์ ๊ณ ์ฅ ์ง๋จ ๋ชจ๋ธ์ ํ์ตํ๊ธฐ ์ํด์๋ ์ถฉ๋ถํ ์์ ๋ ์ด๋ธ ๋ฐ์ดํฐ(labeled data)๊ฐ ํ๋ณด๋ผ์ผ ํ๋ค. ๊ทธ๋ฌ๋ ์ค์ ํ์ฅ์์ ์ด์ฉ๋๊ณ ์๋ ๊ธฐ๊ณ ์์คํ
์ ๊ฒฝ์ฐ, ์ถฉ๋ถํ ์์ ๋ฐ์ดํฐ์ ๋ ์ด๋ธ ์ ๋ณด๋ฅผ ์ป๋ ๊ฒ์ด ์ด๋ ค์ด ๊ฒฝ์ฐ๊ฐ ๋ง๋ค. ๋ฐ๋ผ์ ์ด๋ฌํ ๋ฌธ์ ๋ค์ ํด๊ฒฐํ๊ณ ์ง๋จ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํ ์๋ก์ด ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ ๊ธฐ์ ์ ๊ฐ๋ฐ์ด ํ์ํ๋ค.
๋ณธ ๋ฐ์ฌํ์๋
ผ๋ฌธ์์๋ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ ๊ธฐ์ ์ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํ ์ธ๊ฐ์ง ์ ๋ณด ํ์ฉ ๊ทน๋ํ ๊ธฐ๋ฒ์ ๋ํ ์ฐ๊ตฌ๋ก 1) ๋ฅ๋ฌ๋ ์ํคํ
์ฒ ๋ด ๊ธฐ์ธ๊ธฐ ์ ๋ณด ํ๋ฆ์ ํฅ์์ํค๊ธฐ ์ํ ์๋ก์ด ๋ฅ๋ฌ๋ ๊ตฌ์กฐ ์ฐ๊ตฌ, 2) ํ๋ผ๋ฏธํฐ ์ ์ด ๋ฐ ์ผ์คํญ ์์ค์ ๊ธฐ๋ฐ์ผ๋ก ๋ถ์ถฉ๋ถํ ๋ฐ์ดํฐ ๋ฐ ๋
ธ์ด์ฆ ์กฐ๊ฑด ํ ๊ฐ๊ฑดํ๊ณ ์ฐจ๋ณ์ ์ธ ํน์ฑ์ธ์ ํ์ต์ ๋ํ ์ฐ๊ตฌ, 3) ๋ค๋ฅธ ๋๋ฉ์ธ์ผ๋ก๋ถํฐ ๋ ์ด๋ธ ์ ๋ณด๋ฅผ ์ ์ด์์ผ ์ฌ์ฉํ๋ ๋๋ฉ์ธ ์ ์ ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ ๊ธฐ๋ฒ ์ฐ๊ตฌ๋ฅผ ์ ์ํ๋ค.
์ฒซ ๋ฒ์งธ ์ฐ๊ตฌ์์๋ ๋ฅ๋ฌ๋ ๋ชจ๋ธ ๋ด ๊ธฐ์ธ๊ธฐ ์ ๋ณด ํ๋ฆ์ ๊ฐ์ ํ๊ธฐ ์ํ ํฅ์๋ ํฉ์ฑ๊ณฑ ์ ๊ฒฝ๋ง ๊ธฐ๋ฐ ๊ตฌ์กฐ๋ฅผ ์ ์ํ๋ค. ๋ณธ ์ฐ๊ตฌ์์๋ ๋ค์ํ ๊ณ์ธต์ ์์ํ(feature map)์ ์ง์ ์ฐ๊ฒฐํจ์ผ๋ก์จ ํฅ์๋ ์ ๋ณด ํ๋ฆ์ ์ป์ ์ ์์ผ๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ ์ง๋จ ๋ชจ๋ธ์ ํจ์จ์ ์ผ๋ก ํ์ตํ๋ ๊ฒ์ด ๊ฐ๋ฅํ๋ค. ๋ํ ์ฐจ์ ์ถ์ ๋ชจ๋์ ํตํด ํ์ต ํ๋ผ๋ฏธํฐ ์๋ฅผ ํฌ๊ฒ ์ค์์ผ๋ก์จ ํ์ต ํจ์จ์ฑ์ ๋์ผ ์ ์๋ค.
๋ ๋ฒ์งธ ์ฐ๊ตฌ์์๋ ํ๋ผ๋ฏธํฐ ์ ์ด ๋ฐ ๋ฉํธ๋ฆญ ํ์ต ๊ธฐ๋ฐ ๊ณ ์ฅ ์ง๋จ ๊ธฐ๋ฒ์ ์ ์ํ๋ค. ๋ณธ ์ฐ๊ตฌ๋ ๋ฐ์ดํฐ๊ฐ ๋ถ์ถฉ๋ถํ๊ณ ๋
ธ์ด์ฆ๊ฐ ๋ง์ ์กฐ๊ฑด ํ์์๋ ๋์ ๊ณ ์ฅ ์ง๋จ ์ฑ๋ฅ์ ์ป๊ธฐ ์ํด ๊ฐ๊ฑดํ๊ณ ์ฐจ๋ณ์ ์ธ ํน์ฑ์ธ์ ํ์ต์ ๊ฐ๋ฅํ๊ฒ ํ๋ค. ๋จผ์ , ํ๋ถํ ์์ค ๋๋ฉ์ธ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ฉํด ํ๋ จ๋ ์ฌ์ ํ์ต๋ชจ๋ธ์ ํ๊ฒ ๋๋ฉ์ธ์ผ๋ก ์ ์ดํด ์ฌ์ฉํจ์ผ๋ก์จ ๊ฐ๊ฑดํ ์ง๋จ ๋ฐฉ๋ฒ์ ๊ฐ๋ฐํ ์ ์๋ค. ๋ํ, semi-hard ์ผ์คํญ ์์ค ํจ์๋ฅผ ์ฌ์ฉํจ์ผ๋ก์จ ๊ฐ ์ํ ๋ ์ด๋ธ์ ๋ฐ๋ผ ๋ฐ์ดํฐ๊ฐ ๋ ์ ๋ถ๋ฆฌ๋๋๋ก ํด์ฃผ๋ ํน์ฑ์ธ์๋ฅผ ํ์ตํ ์ ์๋ค.
์ธ ๋ฒ์งธ ์ฐ๊ตฌ์์๋ ๋ ์ด๋ธ์ด ์ง์ ๋์ง ์์(unlabeled) ๋์ ๋๋ฉ์ธ์์์ ๊ณ ์ฅ ์ง๋จ ์ฑ๋ฅ์ ๋์ด๊ธฐ ์ํ ๋ ์ด๋ธ ์ ๋ณด ์ ์ด ์ ๋ต์ ์ ์ํ๋ค. ์ฐ๋ฆฌ๊ฐ ๋ชฉํ๋ก ํ๋ ๋์ ๋๋ฉ์ธ์์์ ๊ณ ์ฅ ์ง๋จ ๋ฐฉ๋ฒ์ ๊ฐ๋ฐํ๊ธฐ ์ํด ๋ค๋ฅธ ์์ค ๋๋ฉ์ธ์์ ์ป์ ๋ ์ด๋ธ ์ ๋ณด๊ฐ ์ ์ด๋์ด ํ์ฉ๋๋ค. ๋์์ ์๋กญ๊ฒ ๊ณ ์ํ ์๋ฏธ๋ก ์ ํด๋ฌ์คํฐ๋ง ์์ค(semantic clustering loss)์ ์ฌ๋ฌ ํน์ฑ์ธ์ ์์ค์ ์ ์ฉํจ์ผ๋ก์จ ์ฐจ๋ณ์ ์ธ ๋๋ฉ์ธ ๋ถ๋ณ ๊ธฐ๋ฅ์ ํ์ตํ๋ค. ๊ฒฐ๊ณผ์ ์ผ๋ก ๋๋ฉ์ธ ๋ถ๋ณ ํน์ฑ์ ๊ฐ์ง๋ฉฐ ์๋ฏธ๋ก ์ ์ผ๋ก ์ ๋ถ๋ฅ๋๋ ํน์ฑ์ธ์๋ฅผ ํจ๊ณผ์ ์ผ๋ก ํ์ตํ ์ ์์์ ์ฆ๋ช
ํ์๋ค.Unexpected failures of mechanical systems can lead to substantial social and financial losses in many industries. In order to detect and prevent sudden failures and to enhance the reliability of mechanical systems, significant research efforts have been made to develop data-driven fault diagnosis techniques. The purpose of fault diagnosis techniques is to detect and identify the occurrence of abnormal behaviors in the target mechanical systems as early as possible. Recently, deep learning (DL) based fault diagnosis approaches, including the convolutional neural network (CNN) method, have shown remarkable fault diagnosis performance, thanks to their autonomous feature learning ability.
Still, there are several issues that remain to be solved in the development of robust and industry-applicable deep learning-based fault diagnosis techniques. First, by stacking the neural network architectures deeper, enriched hierarchical features can be learned, and therefore, improved performance can be achieved. However, due to inefficiency in the gradient information flow and overfitting problems, deeper models cannot be trained comprehensively. Next, to develop a fault diagnosis model with high performance, it is necessary to obtain sufficient labeled data. However, for mechanical systems that operate in real-world environments, it is not easy to obtain sufficient data and label information. Consequently, novel methods that address these issues should be developed to improve the performance of deep learning based fault diagnosis techniques.
This dissertation research investigated three research thrusts aimed toward maximizing the use of information to improve the performance of deep learning based fault diagnosis techniques, specifically: 1) study of the deep learning structure to enhance the gradient information flow within the architecture, 2) study of a robust and discriminative feature learning method under insufficient and noisy data conditions based on parameter transfer and triplet loss, and 3) investigation of a domain adaptation based fault diagnosis method that propagates the label information across different domains.
The first research thrust suggests an advanced CNN-based architecture to improve the gradient information flow within the deep learning model. By directly connecting the feature maps of different layers, the diagnosis model can be trained efficiently thanks to enhanced information flow. In addition, the dimension reduction module also can increase the training efficiency by significantly reducing the number of trainable parameters.
The second research thrust suggests a parameter transfer and metric learning based fault diagnosis method. The proposed approach facilitates robust and discriminative feature learning to enhance fault diagnosis performance under insufficient and noisy data conditions. The pre-trained model trained using abundant source domain data is transferred and used to develop a robust fault diagnosis method. Moreover, a semi-hard triplet loss function is adopted to learn the features with high separability, according to the class labels.
Finally, the last research thrust proposes a label information propagation strategy to increase the fault diagnosis performance in the unlabeled target domain. The label information obtained from the source domain is transferred and utilized for developing fault diagnosis methods in the target domain. Simultaneously, the newly devised semantic clustering loss is applied at multiple feature levels to learn discriminative, domain-invariant features. As a result, features that are not only semantically well-clustered but also domain-invariant can be effectively learned.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Research Scope and Overview 3
1.3 Dissertation Layout 6
Chapter 2 Technical Background and Literature Review 8
2.1 Fault Diagnosis Techniques for Mechanical Systems 8
2.1.1 Fault Diagnosis Techniques 10
2.1.2 Deep Learning Based Fault Diagnosis Techniques 15
2.2 Transfer Learning 22
2.3 Metric Learning 28
2.4 Summary and Discussion 30
Chapter 3 Direct Connection Based Convolutional Neural Network (DC-CNN) for Fault Diagnosis 31
3.1 Directly Connected Convolutional Module 33
3.2 Dimension Reduction Module 34
3.3 Input Vibration Image Generation 36
3.4 DC-CNN-Based Fault Diagnosis Method 40
3.5 Experimental Studies and Results 45
3.5.1 Experiment and Data Description 45
3.5.2 Compared Methods 48
3.5.3 Diagnosis Performance Results 51
3.5.4 The Number of Trainable Parameters 56
3.5.5 Visualization of the Learned Features 58
3.5.6 Robustness of Diagnosis Performance 62
3.6 Summary and Discussion 67
Chapter 4 Robust and Discriminative Feature Learning for Fault Diagnosis Under Insufficient and Noisy Data Conditions 68
4.1 Parameter transfer learning 70
4.2 Robust Feature Learning Based on the Pre-trained model 72
4.3 Discriminative Feature Learning Based on the Triplet loss 77
4.4 Robust and Discriminative Feature Learning for Fault Diagnosis 80
4.5 Experimental Studies and Results 84
4.5.1 Experiment and Data Description 84
4.5.2 Compared Methods 85
4.5.3 Experimental Results Under Insufficient Data Conditions 86
4.5.4 Experimental Results Under Noisy Data Conditions 92
4.6 Summary and Discussion 95
Chapter 5 A Domain Adaptation with Semantic Clustering (DASC) Method for Fault Diagnosis 96
5.1 Unsupervised Domain Adaptation 101
5.2 CNN-based Diagnosis Model 104
5.3 Learning of Domain-invariant Features 105
5.4 Domain Adaptation with Semantic Clustering 107
5.5 Proposed DASC-based Fault Diagnosis Method 109
5.6 Experimental Studies and Results 114
5.6.1 Experiment and Data Description 114
5.6.2 Compared Methods 117
5.6.3 Scenario I: Different Operating Conditions 118
5.6.4 Scenario II: Different Rotating Machinery 125
5.6.5 Analysis and Discussion 131
5.7 Summary and Discussion 140
Chapter 6 Conclusion 141
6.1 Contributions and Significance 141
6.2 Suggestions for Future Research 143
References 146
๊ตญ๋ฌธ ์ด๋ก 154๋ฐ
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page