6 research outputs found
μΌκ΅΄ νμ μΈμ, λμ΄ λ° μ±λ³ μΆμ μ μν λ€μ€ λ°μ΄ν°μ λ€μ€ λλ©μΈ λ€μ€μμ λ€νΈμν¬
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ κΈ°Β·μ 보곡νλΆ,2019. 8. Cho, Nam Ik.컨볼 루μ
λ΄λ΄ λ€νΈμν¬ (CNN)λ μΌκ΅΄κ³Ό κ΄λ ¨λ λ¬Έμ λ₯Ό ν¬ν¨νμ¬ λ§μ μ»΄ν¨ν° λΉμ μμ
μμ λ§€μ° μ μλν©λλ€. κ·Έλ¬λ μ°λ Ή μΆμ λ° μΌκ΅΄ νμ μΈμ (FER)μ κ²½μ° CNNμ΄ μ 곡 ν μ νλλ μ¬μ ν μ€μ λ¬Έμ μ λν΄ μΆ©λΆνμ§ μμ΅λλ€. CNNμ μΌκ΅΄μ μ£Όλ¦μ λκ»μ μμ λ―Έλ¬ν μ°¨μ΄λ₯Ό λ°κ²¬νμ§ λͺ»νμ§λ§,
μ΄κ²μ μ°λ Ή μΆμ κ³Ό FERμ νμμ μ
λλ€. λν μ€μ μΈκ³μμμ μΌκ΅΄ μ΄λ―Έμ§λ CNNμ΄ νλ ¨ λ°μ΄ν°μμ κ°λ₯ν λ νμ λ 물체λ₯Ό μ°Ύλ λ° κ°κ±΄νμ§ μμ νμ λ° μ‘°λͺ
μΌλ‘ μΈν΄ λ§μ μ°¨μ΄κ° μμ΅λλ€.
λν MTL (Multi Task Learning)μ μ¬λ¬ κ°μ§ μ§κ° μμ
μ λμμ ν¨μ¨μ μΌλ‘ μνν©λλ€. λͺ¨λ²μ μΈ MTL λ°©λ²μμλ μλ‘ λ€λ₯Έ μμ
μ λν λͺ¨λ λ μ΄λΈμ ν¨κ» ν¬ν¨νλ λ°μ΄ν° μ§ν©μ ꡬμ±νλ κ²μ κ³ λ €ν΄μΌν©λλ€. κ·Έλ¬λ λμ μμ
μ΄ λ€κ°νλκ³ λ³΅μ‘ν΄μ§λ©΄ λ κ°λ ₯ν λ μ΄λΈμ κ°μ§ κ³Όλνκ² ν° λ°μ΄ν° μΈνΈκ° νμν μ μμ΅λλ€. λ°λΌμ μνλ λΌλ²¨ λ°μ΄ν°λ₯Ό μμ±νλ λΉμ©μ μ’
μ’
μ₯μ λ¬Όμ΄λ©° νΉν λ€μ€ μμ
νμ΅μ κ²½μ° μ₯μ κ°λ©λλ€.
λ°λΌμ μ°λ¦¬λ κ°λ² νν°μ μΊ‘μ κΈ°λ° λ€νΈμν¬ (MTL) λ° λ°μ΄ν° μ¦λ₯λ₯Ό κΈ°λ°μΌλ‘νλ λ€μ€ μμ
νμ΅μ κΈ°λ°ν μλ‘μ΄ λ° κ°λ
νμ΅ λ°©λ²μ μ μνλ€.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face,
which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data.
Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning.
Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4
1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4
1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5
1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5
1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6
1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10
2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10
2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13
2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16
3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20
3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Experiments and Results 25
4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25
4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33
4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34
4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36
5. Conclusion 38
Abstract (In Korean) 49Maste
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising
The paper presents a novel approach for vector-floorplan generation via a
diffusion model, which denoises 2D coordinates of room/door corners with two
inference objectives: 1) a single-step noise as the continuous quantity to
precisely invert the continuous forward process; and 2) the final 2D coordinate
as the discrete quantity to establish geometric incident relationships such as
parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned
floorplan generation, a common workflow in floorplan design. We represent a
floorplan as 1D polygonal loops, each of which corresponds to a room or a door.
Our diffusion model employs a Transformer architecture at the core, which
controls the attention masks based on the input graph-constraint and directly
generates vector-graphics floorplans via a discrete and continuous denoising
process. We have evaluated our approach on RPLAN dataset. The proposed approach
makes significant improvements in all the metrics against the state-of-the-art
with significant margins, while being capable of generating non-Manhattan
structures and controlling the exact number of corners per room. A project
website with supplementary video and document is here
https://aminshabani.github.io/housediffusion
JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models
This paper presents a novel approach to the Extreme Structure from Motion
(E-SfM) problem, which takes a set of room layouts as polygonal curves in the
top-down view, and aligns the room layout pieces by estimating their 2D
translations and rotations, akin to solving the jigsaw puzzle of room layouts.
The biggest discovery and surprise of the paper is that the simple use of a
Diffusion Model solves this challenging registration problem as a conditional
generation process. The paper presents a new dataset of room layouts and
floorplans for 98,780 houses. The qualitative and quantitative evaluations
demonstrate that the proposed approach outperforms the competing methods by
significant margins
GF-CapsNet: Using Gabor Jet and Capsule Networks for Facial Age, Gender, and Expression Recognition
The convolutional neural network ( CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition ( FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. To alleviate these problems, we first propose to use the Gabor filter responses of faces as the input to the CNN, along with the original face image. This method enhances the wrinkles on the face so that the face-related features are found in the earlier stage of convolutional layers, and hence the overall performance is increased. We also adopt the idea of capsule network, which is shown to be robust to the rotation of objects and be able to capture the relationship of facial landmarks. We show that the performance of age estimation and FER are improved by using the capsule network than using the plain CNNs. Moreover, by using the Gabor responses as the input to the capsule network, the overall performances of face-related problems are increased compared to the recent CNN-based methods.N