14,038 research outputs found
Capsule Network based Contrastive Learning of Unsupervised Visual Representations
Capsule Networks have shown tremendous advancement in the past decade,
outperforming the traditional CNNs in various task due to it's equivariant
properties. With the use of vector I/O which provides information of both
magnitude and direction of an object or it's part, there lies an enormous
possibility of using Capsule Networks in unsupervised learning environment for
visual representation tasks such as multi class image classification. In this
paper, we propose Contrastive Capsule (CoCa) Model which is a Siamese style
Capsule Network using Contrastive loss with our novel architecture, training
and testing algorithm. We evaluate the model on unsupervised image
classification CIFAR-10 dataset and achieve a top-1 test accuracy of 70.50% and
top-5 test accuracy of 98.10%. Due to our efficient architecture our model has
31 times less parameters and 71 times less FLOPs than the current SOTA in both
supervised and unsupervised learning
์ผ๊ตด ํ์ ์ธ์, ๋์ด ๋ฐ ์ฑ๋ณ ์ถ์ ์ ์ํ ๋ค์ค ๋ฐ์ดํฐ์ ๋ค์ค ๋๋ฉ์ธ ๋ค์ค์์ ๋คํธ์ํฌ
ํ์๋
ผ๋ฌธ(์์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ,2019. 8. Cho, Nam Ik.์ปจ๋ณผ ๋ฃจ์
๋ด๋ด ๋คํธ์ํฌ (CNN)๋ ์ผ๊ตด๊ณผ ๊ด๋ จ๋ ๋ฌธ์ ๋ฅผ ํฌํจํ์ฌ ๋ง์ ์ปดํจํฐ ๋น์ ์์
์์ ๋งค์ฐ ์ ์๋ํฉ๋๋ค. ๊ทธ๋ฌ๋ ์ฐ๋ น ์ถ์ ๋ฐ ์ผ๊ตด ํ์ ์ธ์ (FER)์ ๊ฒฝ์ฐ CNN์ด ์ ๊ณต ํ ์ ํ๋๋ ์ฌ์ ํ ์ค์ ๋ฌธ์ ์ ๋ํด ์ถฉ๋ถํ์ง ์์ต๋๋ค. CNN์ ์ผ๊ตด์ ์ฃผ๋ฆ์ ๋๊ป์ ์์ ๋ฏธ๋ฌํ ์ฐจ์ด๋ฅผ ๋ฐ๊ฒฌํ์ง ๋ชปํ์ง๋ง,
์ด๊ฒ์ ์ฐ๋ น ์ถ์ ๊ณผ FER์ ํ์์ ์
๋๋ค. ๋ํ ์ค์ ์ธ๊ณ์์์ ์ผ๊ตด ์ด๋ฏธ์ง๋ CNN์ด ํ๋ จ ๋ฐ์ดํฐ์์ ๊ฐ๋ฅํ ๋ ํ์ ๋ ๋ฌผ์ฒด๋ฅผ ์ฐพ๋ ๋ฐ ๊ฐ๊ฑดํ์ง ์์ ํ์ ๋ฐ ์กฐ๋ช
์ผ๋ก ์ธํด ๋ง์ ์ฐจ์ด๊ฐ ์์ต๋๋ค.
๋ํ MTL (Multi Task Learning)์ ์ฌ๋ฌ ๊ฐ์ง ์ง๊ฐ ์์
์ ๋์์ ํจ์จ์ ์ผ๋ก ์ํํฉ๋๋ค. ๋ชจ๋ฒ์ ์ธ MTL ๋ฐฉ๋ฒ์์๋ ์๋ก ๋ค๋ฅธ ์์
์ ๋ํ ๋ชจ๋ ๋ ์ด๋ธ์ ํจ๊ป ํฌํจํ๋ ๋ฐ์ดํฐ ์งํฉ์ ๊ตฌ์ฑํ๋ ๊ฒ์ ๊ณ ๋ คํด์ผํฉ๋๋ค. ๊ทธ๋ฌ๋ ๋์ ์์
์ด ๋ค๊ฐํ๋๊ณ ๋ณต์กํด์ง๋ฉด ๋ ๊ฐ๋ ฅํ ๋ ์ด๋ธ์ ๊ฐ์ง ๊ณผ๋ํ๊ฒ ํฐ ๋ฐ์ดํฐ ์ธํธ๊ฐ ํ์ํ ์ ์์ต๋๋ค. ๋ฐ๋ผ์ ์ํ๋ ๋ผ๋ฒจ ๋ฐ์ดํฐ๋ฅผ ์์ฑํ๋ ๋น์ฉ์ ์ข
์ข
์ฅ์ ๋ฌผ์ด๋ฉฐ ํนํ ๋ค์ค ์์
ํ์ต์ ๊ฒฝ์ฐ ์ฅ์ ๊ฐ๋ฉ๋๋ค.
๋ฐ๋ผ์ ์ฐ๋ฆฌ๋ ๊ฐ๋ฒ ํํฐ์ ์บก์ ๊ธฐ๋ฐ ๋คํธ์ํฌ (MTL) ๋ฐ ๋ฐ์ดํฐ ์ฆ๋ฅ๋ฅผ ๊ธฐ๋ฐ์ผ๋กํ๋ ๋ค์ค ์์
ํ์ต์ ๊ธฐ๋ฐํ ์๋ก์ด ๋ฐ ๊ฐ๋
ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํ๋ค.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face,
which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data.
Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning.
Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4
1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4
1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5
1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5
1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6
1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10
2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10
2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13
2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16
3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20
3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Experiments and Results 25
4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25
4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33
4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34
4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36
5. Conclusion 38
Abstract (In Korean) 49Maste
Multi-labeled Relation Extraction with Attentive Capsule Network
To disclose overlapped multiple relations from a sentence still keeps
challenging. Most current works in terms of neural models inconveniently
assuming that each sentence is explicitly mapped to a relation label, cannot
handle multiple relations properly as the overlapped features of the relations
are either ignored or very difficult to identify. To tackle with the new issue,
we propose a novel approach for multi-labeled relation extraction with capsule
network which acts considerably better than current convolutional or recurrent
net in identifying the highly overlapped relations within an individual
sentence. To better cluster the features and precisely extract the relations,
we further devise attention-based routing algorithm and sliding-margin loss
function, and embed them into our capsule network. The experimental results
show that the proposed approach can indeed extract the highly overlapped
features and achieve significant performance improvement for relation
extraction comparing to the state-of-the-art works.Comment: To be published in AAAI 201
Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
A capsule is a group of neurons, whose activity vector represents the
instantiation parameters of a specific type of entity. In this paper, we
explore the capsule networks used for relation extraction in a multi-instance
multi-label learning framework and propose a novel neural approach based on
capsule networks with attention mechanisms. We evaluate our method with
different benchmarks, and it is demonstrated that our method improves the
precision of the predicted relations. Particularly, we show that capsule
networks improve multiple entity pairs relation extraction.Comment: To be published in EMNLP 201
Polyphonic Sound Event Detection by using Capsule Neural Networks
Artificial sound event detection (SED) has the aim to mimic the human ability
to perceive and understand what is happening in the surroundings. Nowadays,
Deep Learning offers valuable techniques for this goal such as Convolutional
Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has
been recently introduced in the image processing field with the intent to
overcome some of the known limitations of CNNs, specifically regarding the
scarce robustness to affine transformations (i.e., perspective, size,
orientation) and the detection of overlapped images. This motivated the authors
to employ CapsNets to deal with the polyphonic-SED task, in which multiple
sound events occur simultaneously. Specifically, we propose to exploit the
capsule units to represent a set of distinctive properties for each individual
sound event. Capsule units are connected through a so-called "dynamic routing"
that encourages learning part-whole relationships and improves the detection
performance in a polyphonic context. This paper reports extensive evaluations
carried out on three publicly available datasets, showing how the CapsNet-based
algorithm not only outperforms standard CNNs but also allows to achieve the
best results with respect to the state of the art algorithms
VSSA-NET: Vertical Spatial Sequence Attention Network for Traffic Sign Detection
Although traffic sign detection has been studied for years and great progress
has been made with the rise of deep learning technique, there are still many
problems remaining to be addressed. For complicated real-world traffic scenes,
there are two main challenges. Firstly, traffic signs are usually small size
objects, which makes it more difficult to detect than large ones; Secondly, it
is hard to distinguish false targets which resemble real traffic signs in
complex street scenes without context information. To handle these problems, we
propose a novel end-to-end deep learning method for traffic sign detection in
complex environments. Our contributions are as follows: 1) We propose a
multi-resolution feature fusion network architecture which exploits densely
connected deconvolution layers with skip connections, and can learn more
effective features for the small size object; 2) We frame the traffic sign
detection as a spatial sequence classification and regression task, and propose
a vertical spatial sequence attention (VSSA) module to gain more context
information for better detection performance. To comprehensively evaluate the
proposed method, we do experiments on several traffic sign datasets as well as
the general object detection dataset and the results have shown the
effectiveness of our proposed method
- โฆ