10 research outputs found

    Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods

    Full text link
    Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeps dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.Comment: 27 pages, 12 figure

    MergedNET: A simple approach for one-shot learning in siamese networks based on similarity layers

    Get PDF
    Classifiers trained on disjointed classes with few labelled data points are used in one-shot learning to identify visual concepts from other classes. Recently, Siamese networks and similarity layers have been used to solve the one-shot learning problem, achieving state-of-the-art performance on visual-character recognition datasets. Various techniques have been developed over the years to improve the performance of these networks on fine-grained image classification datasets. They focused primarily on improving the loss and activation functions, augmenting visual features, employing multiscale metric learning, and pre-training and fine-tuning the backbone network. We investigate similarity layers for one-shot learning tasks and propose two frameworks for combining these layers into a MergedNet network. On all four datasets used in our experiment, MergedNet outperformed the baselines based on classification accuracy, and it generalises to other datasets when trained on miniImageNet

    ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ์„ ์œ„ํ•œ ์ •๋ณด ํ™œ์šฉ ๊ทน๋Œ€ํ™” ๊ธฐ๋ฒ• ๊ฐœ๋ฐœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ๊ธฐ๊ณ„ํ•ญ๊ณต๊ณตํ•™๋ถ€, 2021.8. ์œค๋ณ‘๋™.๊ธฐ๊ณ„ ์‹œ์Šคํ…œ์˜ ์˜ˆ๊ธฐ์น˜ ์•Š์€ ๊ณ ์žฅ์€ ๋งŽ์€ ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ๋ง‰๋Œ€ํ•œ ์‚ฌํšŒ์ , ๊ฒฝ์ œ์  ์†์‹ค์„ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฐ‘์ž‘์Šค๋Ÿฐ ๊ณ ์žฅ์„ ๊ฐ์ง€ํ•˜๊ณ  ์˜ˆ๋ฐฉํ•˜์—ฌ ๊ธฐ๊ณ„ ์‹œ์Šคํ…œ์˜ ์‹ ๋ขฐ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ์ˆ ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํ•˜๊ฒŒ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ์ˆ ์˜ ๋ชฉํ‘œ๋Š” ๋Œ€์ƒ ๊ธฐ๊ณ„ ์‹œ์Šคํ…œ์˜ ๊ณ ์žฅ ๋ฐœ์ƒ์„ ๊ฐ€๋Šฅํ•œ ๋นจ๋ฆฌ ๊ฐ์ง€ํ•˜๊ณ  ์ง„๋‹จํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ตœ๊ทผ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฒ•์„ ํฌํ•จํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ์ˆ ์€ ์ž์œจ์ ์ธ ํŠน์„ฑ์ธ์ž(feature) ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๊ณ  ๋†’์€ ์ง„๋‹จ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์–ด ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ์ˆ ์„ ๊ฐœ๋ฐœํ•จ์— ์žˆ์–ด ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๋ช‡ ๊ฐ€์ง€ ๋ฌธ์ œ์ ๋“ค์ด ์กด์žฌํ•œ๋‹ค. ๋จผ์ €, ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ๋ฅผ ๊นŠ๊ฒŒ ์Œ“์Œ์œผ๋กœ์จ ํ’๋ถ€ํ•œ ๊ณ„์ธต์  ํŠน์„ฑ์ธ์ž๋“ค์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๊ณ , ์ด๋ฅผ ํ†ตํ•ด ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์šธ๊ธฐ(gradient) ์ •๋ณด ํ๋ฆ„์˜ ๋น„ํšจ์œจ์„ฑ๊ณผ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ๋กœ ์ธํ•ด ๋ชจ๋ธ์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•™์Šต์ด ์–ด๋ ต๊ฒŒ ๋œ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ, ๋†’์€ ์„ฑ๋Šฅ์˜ ๊ณ ์žฅ ์ง„๋‹จ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ถฉ๋ถ„ํ•œ ์–‘์˜ ๋ ˆ์ด๋ธ” ๋ฐ์ดํ„ฐ(labeled data)๊ฐ€ ํ™•๋ณด๋ผ์•ผ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ํ˜„์žฅ์—์„œ ์šด์šฉ๋˜๊ณ  ์žˆ๋Š” ๊ธฐ๊ณ„ ์‹œ์Šคํ…œ์˜ ๊ฒฝ์šฐ, ์ถฉ๋ถ„ํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ” ์ •๋ณด๋ฅผ ์–ป๋Š” ๊ฒƒ์ด ์–ด๋ ค์šด ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ณ  ์ง„๋‹จ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ์ˆ ์˜ ๊ฐœ๋ฐœ์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ์ˆ ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์„ธ๊ฐ€์ง€ ์ •๋ณด ํ™œ์šฉ ๊ทน๋Œ€ํ™” ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋กœ 1) ๋”ฅ๋Ÿฌ๋‹ ์•„ํ‚คํ…์ฒ˜ ๋‚ด ๊ธฐ์šธ๊ธฐ ์ •๋ณด ํ๋ฆ„์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋”ฅ๋Ÿฌ๋‹ ๊ตฌ์กฐ ์—ฐ๊ตฌ, 2) ํŒŒ๋ผ๋ฏธํ„ฐ ์ „์ด ๋ฐ ์‚ผ์ค‘ํ•ญ ์†์‹ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถˆ์ถฉ๋ถ„ํ•œ ๋ฐ์ดํ„ฐ ๋ฐ ๋…ธ์ด์ฆˆ ์กฐ๊ฑด ํ•˜ ๊ฐ•๊ฑดํ•˜๊ณ  ์ฐจ๋ณ„์ ์ธ ํŠน์„ฑ์ธ์ž ํ•™์Šต์— ๋Œ€ํ•œ ์—ฐ๊ตฌ, 3) ๋‹ค๋ฅธ ๋„๋ฉ”์ธ์œผ๋กœ๋ถ€ํ„ฐ ๋ ˆ์ด๋ธ” ์ •๋ณด๋ฅผ ์ „์ด์‹œ์ผœ ์‚ฌ์šฉํ•˜๋Š” ๋„๋ฉ”์ธ ์ ์‘ ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ๋ฒ• ์—ฐ๊ตฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋‚ด ๊ธฐ์šธ๊ธฐ ์ •๋ณด ํ๋ฆ„์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ํ–ฅ์ƒ๋œ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๊ณ„์ธต์˜ ์•„์›ƒํ’‹(feature map)์„ ์ง์ ‘ ์—ฐ๊ฒฐํ•จ์œผ๋กœ์จ ํ–ฅ์ƒ๋œ ์ •๋ณด ํ๋ฆ„์„ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ ์ง„๋‹จ ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜ํ•œ ์ฐจ์› ์ถ•์†Œ ๋ชจ๋“ˆ์„ ํ†ตํ•ด ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ž„์œผ๋กœ์จ ํ•™์Šต ํšจ์œจ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ์—์„œ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ „์ด ๋ฐ ๋ฉ”ํŠธ๋ฆญ ํ•™์Šต ๊ธฐ๋ฐ˜ ๊ณ ์žฅ ์ง„๋‹จ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถˆ์ถฉ๋ถ„ํ•˜๊ณ  ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ์กฐ๊ฑด ํ•˜์—์„œ๋„ ๋†’์€ ๊ณ ์žฅ ์ง„๋‹จ ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด ๊ฐ•๊ฑดํ•˜๊ณ  ์ฐจ๋ณ„์ ์ธ ํŠน์„ฑ์ธ์ž ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ๋จผ์ €, ํ’๋ถ€ํ•œ ์†Œ์Šค ๋„๋ฉ”์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ํ›ˆ๋ จ๋œ ์‚ฌ์ „ํ•™์Šต๋ชจ๋ธ์„ ํƒ€๊ฒŸ ๋„๋ฉ”์ธ์œผ๋กœ ์ „์ดํ•ด ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๊ฐ•๊ฑดํ•œ ์ง„๋‹จ ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, semi-hard ์‚ผ์ค‘ํ•ญ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๊ฐ ์ƒํƒœ ๋ ˆ์ด๋ธ”์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๊ฐ€ ๋” ์ž˜ ๋ถ„๋ฆฌ๋˜๋„๋ก ํ•ด์ฃผ๋Š” ํŠน์„ฑ์ธ์ž๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€(unlabeled) ๋Œ€์ƒ ๋„๋ฉ”์ธ์—์„œ์˜ ๊ณ ์žฅ ์ง„๋‹จ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ๋ ˆ์ด๋ธ” ์ •๋ณด ์ „์ด ์ „๋žต์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๋Œ€์ƒ ๋„๋ฉ”์ธ์—์„œ์˜ ๊ณ ์žฅ ์ง„๋‹จ ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค๋ฅธ ์†Œ์Šค ๋„๋ฉ”์ธ์—์„œ ์–ป์€ ๋ ˆ์ด๋ธ” ์ •๋ณด๊ฐ€ ์ „์ด๋˜์–ด ํ™œ์šฉ๋œ๋‹ค. ๋™์‹œ์— ์ƒˆ๋กญ๊ฒŒ ๊ณ ์•ˆํ•œ ์˜๋ฏธ๋ก ์  ํด๋Ÿฌ์Šคํ„ฐ๋ง ์†์‹ค(semantic clustering loss)์„ ์—ฌ๋Ÿฌ ํŠน์„ฑ์ธ์ž ์ˆ˜์ค€์— ์ ์šฉํ•จ์œผ๋กœ์จ ์ฐจ๋ณ„์ ์ธ ๋„๋ฉ”์ธ ๋ถˆ๋ณ€ ๊ธฐ๋Šฅ์„ ํ•™์Šตํ•œ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋„๋ฉ”์ธ ๋ถˆ๋ณ€ ํŠน์„ฑ์„ ๊ฐ€์ง€๋ฉฐ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์ž˜ ๋ถ„๋ฅ˜๋˜๋Š” ํŠน์„ฑ์ธ์ž๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ•˜์˜€๋‹ค.Unexpected failures of mechanical systems can lead to substantial social and financial losses in many industries. In order to detect and prevent sudden failures and to enhance the reliability of mechanical systems, significant research efforts have been made to develop data-driven fault diagnosis techniques. The purpose of fault diagnosis techniques is to detect and identify the occurrence of abnormal behaviors in the target mechanical systems as early as possible. Recently, deep learning (DL) based fault diagnosis approaches, including the convolutional neural network (CNN) method, have shown remarkable fault diagnosis performance, thanks to their autonomous feature learning ability. Still, there are several issues that remain to be solved in the development of robust and industry-applicable deep learning-based fault diagnosis techniques. First, by stacking the neural network architectures deeper, enriched hierarchical features can be learned, and therefore, improved performance can be achieved. However, due to inefficiency in the gradient information flow and overfitting problems, deeper models cannot be trained comprehensively. Next, to develop a fault diagnosis model with high performance, it is necessary to obtain sufficient labeled data. However, for mechanical systems that operate in real-world environments, it is not easy to obtain sufficient data and label information. Consequently, novel methods that address these issues should be developed to improve the performance of deep learning based fault diagnosis techniques. This dissertation research investigated three research thrusts aimed toward maximizing the use of information to improve the performance of deep learning based fault diagnosis techniques, specifically: 1) study of the deep learning structure to enhance the gradient information flow within the architecture, 2) study of a robust and discriminative feature learning method under insufficient and noisy data conditions based on parameter transfer and triplet loss, and 3) investigation of a domain adaptation based fault diagnosis method that propagates the label information across different domains. The first research thrust suggests an advanced CNN-based architecture to improve the gradient information flow within the deep learning model. By directly connecting the feature maps of different layers, the diagnosis model can be trained efficiently thanks to enhanced information flow. In addition, the dimension reduction module also can increase the training efficiency by significantly reducing the number of trainable parameters. The second research thrust suggests a parameter transfer and metric learning based fault diagnosis method. The proposed approach facilitates robust and discriminative feature learning to enhance fault diagnosis performance under insufficient and noisy data conditions. The pre-trained model trained using abundant source domain data is transferred and used to develop a robust fault diagnosis method. Moreover, a semi-hard triplet loss function is adopted to learn the features with high separability, according to the class labels. Finally, the last research thrust proposes a label information propagation strategy to increase the fault diagnosis performance in the unlabeled target domain. The label information obtained from the source domain is transferred and utilized for developing fault diagnosis methods in the target domain. Simultaneously, the newly devised semantic clustering loss is applied at multiple feature levels to learn discriminative, domain-invariant features. As a result, features that are not only semantically well-clustered but also domain-invariant can be effectively learned.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Scope and Overview 3 1.3 Dissertation Layout 6 Chapter 2 Technical Background and Literature Review 8 2.1 Fault Diagnosis Techniques for Mechanical Systems 8 2.1.1 Fault Diagnosis Techniques 10 2.1.2 Deep Learning Based Fault Diagnosis Techniques 15 2.2 Transfer Learning 22 2.3 Metric Learning 28 2.4 Summary and Discussion 30 Chapter 3 Direct Connection Based Convolutional Neural Network (DC-CNN) for Fault Diagnosis 31 3.1 Directly Connected Convolutional Module 33 3.2 Dimension Reduction Module 34 3.3 Input Vibration Image Generation 36 3.4 DC-CNN-Based Fault Diagnosis Method 40 3.5 Experimental Studies and Results 45 3.5.1 Experiment and Data Description 45 3.5.2 Compared Methods 48 3.5.3 Diagnosis Performance Results 51 3.5.4 The Number of Trainable Parameters 56 3.5.5 Visualization of the Learned Features 58 3.5.6 Robustness of Diagnosis Performance 62 3.6 Summary and Discussion 67 Chapter 4 Robust and Discriminative Feature Learning for Fault Diagnosis Under Insufficient and Noisy Data Conditions 68 4.1 Parameter transfer learning 70 4.2 Robust Feature Learning Based on the Pre-trained model 72 4.3 Discriminative Feature Learning Based on the Triplet loss 77 4.4 Robust and Discriminative Feature Learning for Fault Diagnosis 80 4.5 Experimental Studies and Results 84 4.5.1 Experiment and Data Description 84 4.5.2 Compared Methods 85 4.5.3 Experimental Results Under Insufficient Data Conditions 86 4.5.4 Experimental Results Under Noisy Data Conditions 92 4.6 Summary and Discussion 95 Chapter 5 A Domain Adaptation with Semantic Clustering (DASC) Method for Fault Diagnosis 96 5.1 Unsupervised Domain Adaptation 101 5.2 CNN-based Diagnosis Model 104 5.3 Learning of Domain-invariant Features 105 5.4 Domain Adaptation with Semantic Clustering 107 5.5 Proposed DASC-based Fault Diagnosis Method 109 5.6 Experimental Studies and Results 114 5.6.1 Experiment and Data Description 114 5.6.2 Compared Methods 117 5.6.3 Scenario I: Different Operating Conditions 118 5.6.4 Scenario II: Different Rotating Machinery 125 5.6.5 Analysis and Discussion 131 5.7 Summary and Discussion 140 Chapter 6 Conclusion 141 6.1 Contributions and Significance 141 6.2 Suggestions for Future Research 143 References 146 ๊ตญ๋ฌธ ์ดˆ๋ก 154๋ฐ•

    Learning from Very Few Samples: A Survey

    Full text link
    Few sample learning (FSL) is significant and challenging in the field of machine learning. The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence since humans can readily establish their cognition to novelty from just a single or a handful of examples whereas machine learning algorithms typically entail hundreds or thousands of supervised samples to guarantee generalization ability. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning technologies, little surveys or reviews for FSL are available until now. In this context, we extensively review 300+ papers of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history as well as the current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review the latest advances on these topics. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
    corecore