Search CORE

14,038 research outputs found

Capsule Network based Contrastive Learning of Unsupervised Visual Representations

Author: Panwar Harsh
Patras Ioannis
Publication venue
Publication date: 22/09/2022
Field of study

Capsule Networks have shown tremendous advancement in the past decade, outperforming the traditional CNNs in various task due to it's equivariant properties. With the use of vector I/O which provides information of both magnitude and direction of an object or it's part, there lies an enormous possibility of using Capsule Networks in unsupervised learning environment for visual representation tasks such as multi class image classification. In this paper, we propose Contrastive Capsule (CoCa) Model which is a Siamese style Capsule Network using Contrastive loss with our novel architecture, training and testing algorithm. We evaluate the model on unsupervised image classification CIFAR-10 dataset and achieve a top-1 test accuracy of 70.50% and top-5 test accuracy of 98.10%. Due to our efficient architecture our model has 31 times less parameters and 71 times less FLOPs than the current SOTA in both supervised and unsupervised learning

arXiv.org e-Print Archive

얼굴 표정 인식, 나이 및 성별 추정을 위한 다중 데이터셋 다중 도메인 다중작업 네트워크

Author: Hosseini Sepidehsadat
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :공과대학 전기·정보공학부,2019. 8. Cho, Nam Ik.컨볼 루션 뉴럴 네트워크 (CNN)는 얼굴과 관련된 문제를 포함하여 많은 컴퓨터 비전 작업에서 매우 잘 작동합니다. 그러나 연령 추정 및 얼굴 표정 인식 (FER)의 경우 CNN이 제공 한 정확도는 여전히 실제 문제에 대해 충분하지 않습니다. CNN은 얼굴의 주름의 두께와 양의 미묘한 차이를 발견하지 못했지만, 이것은 연령 추정과 FER에 필수적입니다. 또한 실제 세계에서의 얼굴 이미지는 CNN이 훈련 데이터에서 가능할 때 회전 된 물체를 찾는 데 강건하지 않은 회전 및 조명으로 인해 많은 차이가 있습니다. 또한 MTL (Multi Task Learning)은 여러 가지 지각 작업을 동시에 효율적으로 수행합니다. 모범적 인 MTL 방법에서는 서로 다른 작업에 대한 모든 레이블을 함께 포함하는 데이터 집합을 구성하는 것을 고려해야합니다. 그러나 대상 작업이 다각화되고 복잡해지면 더 강력한 레이블을 가진 과도하게 큰 데이터 세트가 필요할 수 있습니다. 따라서 원하는 라벨 데이터를 생성하는 비용은 종종 장애물이며 특히 다중 작업 학습의 경우 장애가됩니다. 따라서 우리는 가버 필터와 캡슐 기반 네트워크 (MTL) 및 데이터 증류를 기반으로하는 다중 작업 학습에 기반한 새로운 반 감독 학습 방법을 제안한다.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning. Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4 1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4 1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5 1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5 1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6 1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6 1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10 2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10 2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13 2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16 3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20 3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4. Experiments and Results 25 4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25 4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33 4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36 5. Conclusion 38 Abstract (In Korean) 49Maste

SNU Open Repository and Archive

Multi-labeled Relation Extraction with Attentive Capsule Network

Author: Jia Weijia
Li Pengshuai
Zhang Xinsong
Zhao Hai
Publication venue
Publication date: 11/11/2018
Field of study

To disclose overlapped multiple relations from a sentence still keeps challenging. Most current works in terms of neural models inconveniently assuming that each sentence is explicitly mapped to a relation label, cannot handle multiple relations properly as the overlapped features of the relations are either ignored or very difficult to identify. To tackle with the new issue, we propose a novel approach for multi-labeled relation extraction with capsule network which acts considerably better than current convolutional or recurrent net in identifying the highly overlapped relations within an individual sentence. To better cluster the features and precisely extract the relations, we further devise attention-based routing algorithm and sliding-margin loss function, and embed them into our capsule network. The experimental results show that the proposed approach can indeed extract the highly overlapped features and achieve significant performance improvement for relation extraction comparing to the state-of-the-art works.Comment: To be published in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction

Author: Chen Huajun
Chen Xi
Deng Shumin
Sun Zhanlin
Zhang Ningyu
Zhang Wei
Publication venue
Publication date: 29/12/2018
Field of study

A capsule is a group of neurons, whose activity vector represents the instantiation parameters of a specific type of entity. In this paper, we explore the capsule networks used for relation extraction in a multi-instance multi-label learning framework and propose a novel neural approach based on capsule networks with attention mechanisms. We evaluate our method with different benchmarks, and it is demonstrated that our method improves the precision of the predicted relations. Particularly, we show that capsule networks improve multiple entity pairs relation extraction.Comment: To be published in EMNLP 201

arXiv.org e-Print Archive

Polyphonic Sound Event Detection by using Capsule Neural Networks

Author: Gabrielli Leonardo
Principi Emanuele
Squartini Stefano
Vesperini Fabio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/01/2019
Field of study

Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

VSSA-NET: Vertical Spatial Sequence Attention Network for Traffic Sign Detection

Author: IEEE
IEEE
IEEE
Member Senior
Member Senior
Member Student
Wang Qi
Xiong Zhitong
Yuan Yuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2019
Field of study

Although traffic sign detection has been studied for years and great progress has been made with the rise of deep learning technique, there are still many problems remaining to be addressed. For complicated real-world traffic scenes, there are two main challenges. Firstly, traffic signs are usually small size objects, which makes it more difficult to detect than large ones; Secondly, it is hard to distinguish false targets which resemble real traffic signs in complex street scenes without context information. To handle these problems, we propose a novel end-to-end deep learning method for traffic sign detection in complex environments. Our contributions are as follows: 1) We propose a multi-resolution feature fusion network architecture which exploits densely connected deconvolution layers with skip connections, and can learn more effective features for the small size object; 2) We frame the traffic sign detection as a spatial sequence classification and regression task, and propose a vertical spatial sequence attention (VSSA) module to gain more context information for better detection performance. To comprehensively evaluate the proposed method, we do experiments on several traffic sign datasets as well as the general object detection dataset and the results have shown the effectiveness of our proposed method

arXiv.org e-Print Archive