6 research outputs found

    μ–Όκ΅΄ ν‘œμ • 인식, λ‚˜μ΄ 및 성별 좔정을 μœ„ν•œ 닀쀑 데이터셋 닀쀑 도메인 λ‹€μ€‘μž‘μ—… λ„€νŠΈμ›Œν¬

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀,2019. 8. Cho, Nam Ik.컨볼 λ£¨μ…˜ λ‰΄λŸ΄ λ„€νŠΈμ›Œν¬ (CNN)λŠ” μ–Όκ΅΄κ³Ό κ΄€λ ¨λœ 문제λ₯Ό ν¬ν•¨ν•˜μ—¬ λ§Žμ€ 컴퓨터 λΉ„μ „ μž‘μ—…μ—μ„œ 맀우 잘 μž‘λ™ν•©λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ μ—°λ Ή μΆ”μ • 및 μ–Όκ΅΄ ν‘œμ • 인식 (FER)의 경우 CNN이 제곡 ν•œ μ •ν™•λ„λŠ” μ—¬μ „νžˆ μ‹€μ œ λ¬Έμ œμ— λŒ€ν•΄ μΆ©λΆ„ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. CNN은 μ–Όκ΅΄μ˜ μ£Όλ¦„μ˜ λ‘κ»˜μ™€ μ–‘μ˜ λ―Έλ¬˜ν•œ 차이λ₯Ό λ°œκ²¬ν•˜μ§€ λͺ»ν–ˆμ§€λ§Œ, 이것은 μ—°λ Ή μΆ”μ •κ³Ό FER에 ν•„μˆ˜μ μž…λ‹ˆλ‹€. λ˜ν•œ μ‹€μ œ μ„Έκ³„μ—μ„œμ˜ μ–Όκ΅΄ μ΄λ―Έμ§€λŠ” CNN이 ν›ˆλ ¨ λ°μ΄ν„°μ—μ„œ κ°€λŠ₯ν•  λ•Œ νšŒμ „ 된 물체λ₯Ό μ°ΎλŠ” 데 κ°•κ±΄ν•˜μ§€ μ•Šμ€ νšŒμ „ 및 μ‘°λͺ…μœΌλ‘œ 인해 λ§Žμ€ 차이가 μžˆμŠ΅λ‹ˆλ‹€. λ˜ν•œ MTL (Multi Task Learning)은 μ—¬λŸ¬ 가지 지각 μž‘μ—…μ„ λ™μ‹œμ— 효율적으둜 μˆ˜ν–‰ν•©λ‹ˆλ‹€. λͺ¨λ²”적 인 MTL λ°©λ²•μ—μ„œλŠ” μ„œλ‘œ λ‹€λ₯Έ μž‘μ—…μ— λŒ€ν•œ λͺ¨λ“  λ ˆμ΄λΈ”μ„ ν•¨κ»˜ ν¬ν•¨ν•˜λŠ” 데이터 집합을 κ΅¬μ„±ν•˜λŠ” 것을 κ³ λ €ν•΄μ•Όν•©λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ λŒ€μƒ μž‘μ—…μ΄ λ‹€κ°ν™”λ˜κ³  λ³΅μž‘ν•΄μ§€λ©΄ 더 κ°•λ ₯ν•œ λ ˆμ΄λΈ”μ„ 가진 κ³Όλ„ν•˜κ²Œ 큰 데이터 μ„ΈνŠΈκ°€ ν•„μš”ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ”°λΌμ„œ μ›ν•˜λŠ” 라벨 데이터λ₯Ό μƒμ„±ν•˜λŠ” λΉ„μš©μ€ μ’…μ’… μž₯애물이며 특히 닀쀑 μž‘μ—… ν•™μŠ΅μ˜ 경우 μž₯μ• κ°€λ©λ‹ˆλ‹€. λ”°λΌμ„œ μš°λ¦¬λŠ” 가버 필터와 캑슐 기반 λ„€νŠΈμ›Œν¬ (MTL) 및 데이터 증λ₯˜λ₯Ό κΈ°λ°˜μœΌλ‘œν•˜λŠ” 닀쀑 μž‘μ—… ν•™μŠ΅μ— κΈ°λ°˜ν•œ μƒˆλ‘œμš΄ 반 감독 ν•™μŠ΅ 방법을 μ œμ•ˆν•œλ‹€.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning. Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4 1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4 1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5 1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5 1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6 1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6 1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10 2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10 2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13 2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16 3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20 3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4. Experiments and Results 25 4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25 4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33 4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36 5. Conclusion 38 Abstract (In Korean) 49Maste

    HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

    Full text link
    The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned floorplan generation, a common workflow in floorplan design. We represent a floorplan as 1D polygonal loops, each of which corresponds to a room or a door. Our diffusion model employs a Transformer architecture at the core, which controls the attention masks based on the input graph-constraint and directly generates vector-graphics floorplans via a discrete and continuous denoising process. We have evaluated our approach on RPLAN dataset. The proposed approach makes significant improvements in all the metrics against the state-of-the-art with significant margins, while being capable of generating non-Manhattan structures and controlling the exact number of corners per room. A project website with supplementary video and document is here https://aminshabani.github.io/housediffusion

    JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models

    Full text link
    This paper presents a novel approach to the Extreme Structure from Motion (E-SfM) problem, which takes a set of room layouts as polygonal curves in the top-down view, and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. The biggest discovery and surprise of the paper is that the simple use of a Diffusion Model solves this challenging registration problem as a conditional generation process. The paper presents a new dataset of room layouts and floorplans for 98,780 houses. The qualitative and quantitative evaluations demonstrate that the proposed approach outperforms the competing methods by significant margins

    GF-CapsNet: Using Gabor Jet and Capsule Networks for Facial Age, Gender, and Expression Recognition

    No full text
    The convolutional neural network ( CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition ( FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. To alleviate these problems, we first propose to use the Gabor filter responses of faces as the input to the CNN, along with the original face image. This method enhances the wrinkles on the face so that the face-related features are found in the earlier stage of convolutional layers, and hence the overall performance is increased. We also adopt the idea of capsule network, which is shown to be robust to the rotation of objects and be able to capture the relationship of facial landmarks. We show that the performance of age estimation and FER are improved by using the capsule network than using the plain CNNs. Moreover, by using the Gabor responses as the input to the capsule network, the overall performances of face-related problems are increased compared to the recent CNN-based methods.N
    corecore