5,415 research outputs found

    Improving Landmark Localization with Semi-Supervised Learning

    Full text link
    We present two techniques to improve landmark localization in images from partially annotated datasets. Our primary goal is to leverage the common situation where precise landmark locations are only provided for a small data subset, but where class labels for classification or regression tasks related to the landmarks are more abundantly available. First, we propose the framework of sequential multitasking and explore it here through an architecture for landmark localization where training with class labels acts as an auxiliary signal to guide the landmark localization on unlabeled data. A key aspect of our approach is that errors can be backpropagated through a complete landmark localization model. Second, we propose and explore an unsupervised learning technique for landmark localization based on having a model predict equivariant landmarks with respect to transformations applied to the image. We show that these techniques, improve landmark prediction considerably and can learn effective detectors even when only a small fraction of the dataset has landmark labels. We present results on two toy datasets and four real datasets, with hands and faces, and report new state-of-the-art on two datasets in the wild, e.g. with only 5\% of labeled images we outperform previous state-of-the-art trained on the AFLW dataset.Comment: Published as a conference paper in CVPR 201

    Online learning and detection of faces with low human supervision

    Get PDF
    The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses ยฟ Exemplar-based Nearest Neighbors (ยฟENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ยฟENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%). As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft

    Face Centered Image Analysis Using Saliency and Deep Learning Based Techniques

    Get PDF
    Image analysis starts with the purpose of configuring vision machines that can perceive like human to intelligently infer general principles and sense the surrounding situations from imagery. This dissertation studies the face centered image analysis as the core problem in high level computer vision research and addresses the problem by tackling three challenging subjects: Are there anything interesting in the image? If there is, what is/are that/they? If there is a person presenting, who is he/she? What kind of expression he/she is performing? Can we know his/her age? Answering these problems results in the saliency-based object detection, deep learning structured objects categorization and recognition, human facial landmark detection and multitask biometrics. To implement object detection, a three-level saliency detection based on the self-similarity technique (SMAP) is firstly proposed in the work. The first level of SMAP accommodates statistical methods to generate proto-background patches, followed by the second level that implements local contrast computation based on image self-similarity characteristics. At last, the spatial color distribution constraint is considered to realize the saliency detection. The outcome of the algorithm is a full resolution image with highlighted saliency objects and well-defined edges. In object recognition, the Adaptive Deconvolution Network (ADN) is implemented to categorize the objects extracted from saliency detection. To improve the system performance, L1/2 norm regularized ADN has been proposed and tested in different applications. The results demonstrate the efficiency and significance of the new structure. To fully understand the facial biometrics related activity contained in the image, the low rank matrix decomposition is introduced to help locate the landmark points on the face images. The natural extension of this work is beneficial in human facial expression recognition and facial feature parsing research. To facilitate the understanding of the detected facial image, the automatic facial image analysis becomes essential. We present a novel deeply learnt tree-structured face representation to uniformly model the human face with different semantic meanings. We show that the proposed feature yields unified representation in multi-task facial biometrics and the multi-task learning framework is applicable to many other computer vision tasks

    ์–ผ๊ตด ํ‘œ์ • ์ธ์‹, ๋‚˜์ด ๋ฐ ์„ฑ๋ณ„ ์ถ”์ •์„ ์œ„ํ•œ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์ค‘ ๋„๋ฉ”์ธ ๋‹ค์ค‘์ž‘์—… ๋„คํŠธ์›Œํฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2019. 8. Cho, Nam Ik.์ปจ๋ณผ ๋ฃจ์…˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (CNN)๋Š” ์–ผ๊ตด๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์ œ๋ฅผ ํฌํ•จํ•˜์—ฌ ๋งŽ์€ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ๋งค์šฐ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์—ฐ๋ น ์ถ”์ • ๋ฐ ์–ผ๊ตด ํ‘œ์ • ์ธ์‹ (FER)์˜ ๊ฒฝ์šฐ CNN์ด ์ œ๊ณต ํ•œ ์ •ํ™•๋„๋Š” ์—ฌ์ „ํžˆ ์‹ค์ œ ๋ฌธ์ œ์— ๋Œ€ํ•ด ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. CNN์€ ์–ผ๊ตด์˜ ์ฃผ๋ฆ„์˜ ๋‘๊ป˜์™€ ์–‘์˜ ๋ฏธ๋ฌ˜ํ•œ ์ฐจ์ด๋ฅผ ๋ฐœ๊ฒฌํ•˜์ง€ ๋ชปํ–ˆ์ง€๋งŒ, ์ด๊ฒƒ์€ ์—ฐ๋ น ์ถ”์ •๊ณผ FER์— ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์‹ค์ œ ์„ธ๊ณ„์—์„œ์˜ ์–ผ๊ตด ์ด๋ฏธ์ง€๋Š” CNN์ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—์„œ ๊ฐ€๋Šฅํ•  ๋•Œ ํšŒ์ „ ๋œ ๋ฌผ์ฒด๋ฅผ ์ฐพ๋Š” ๋ฐ ๊ฐ•๊ฑดํ•˜์ง€ ์•Š์€ ํšŒ์ „ ๋ฐ ์กฐ๋ช…์œผ๋กœ ์ธํ•ด ๋งŽ์€ ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ MTL (Multi Task Learning)์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ง€๊ฐ ์ž‘์—…์„ ๋™์‹œ์— ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ฒ”์  ์ธ MTL ๋ฐฉ๋ฒ•์—์„œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ž‘์—…์— ๋Œ€ํ•œ ๋ชจ๋“  ๋ ˆ์ด๋ธ”์„ ํ•จ๊ป˜ ํฌํ•จํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋Œ€์ƒ ์ž‘์—…์ด ๋‹ค๊ฐํ™”๋˜๊ณ  ๋ณต์žกํ•ด์ง€๋ฉด ๋” ๊ฐ•๋ ฅํ•œ ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ง„ ๊ณผ๋„ํ•˜๊ฒŒ ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์›ํ•˜๋Š” ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋น„์šฉ์€ ์ข…์ข… ์žฅ์• ๋ฌผ์ด๋ฉฐ ํŠนํžˆ ๋‹ค์ค‘ ์ž‘์—… ํ•™์Šต์˜ ๊ฒฝ์šฐ ์žฅ์• ๊ฐ€๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ€๋ฒ„ ํ•„ํ„ฐ์™€ ์บก์Š ๊ธฐ๋ฐ˜ ๋„คํŠธ์›Œํฌ (MTL) ๋ฐ ๋ฐ์ดํ„ฐ ์ฆ๋ฅ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœํ•˜๋Š” ๋‹ค์ค‘ ์ž‘์—… ํ•™์Šต์— ๊ธฐ๋ฐ˜ํ•œ ์ƒˆ๋กœ์šด ๋ฐ˜ ๊ฐ๋… ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค.The convolutional neural network (CNN) works very well in many computer vision tasks including the face-related problems. However, in the case of age estimation and facial expression recognition (FER), the accuracy provided by the CNN is still not good enough to be used for the real-world problems. It seems that the CNN does not well find the subtle differences in thickness and amount of wrinkles on the face, which are the essential features for the age estimation and FER. Also, the face images in the real world have many variations due to the face rotation and illumination, where the CNN is not robust in finding the rotated objects when not every possible variation is in the training data. Moreover, The Multi Task Learning (MTL) Based based methods can be much helpful to achieve the real-time visual understanding of a dynamic scene, as they are able to perform several different perceptual tasks simultaneously and efficiently. In the exemplary MTL methods, we need to consider constructing a dataset that contains all the labels for different tasks together. However, as the target task becomes multi-faceted and more complicated, sometimes unduly large dataset with stronger labels is required. Hence, the cost of generating desired labeled data for complicated learning tasks is often an obstacle, especially for multi-task learning. Therefore, first to alleviate these problems, we first propose few methods in order to improve single task baseline performance using gabor filters and Capsule Based Networks , Then We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation.1 INTRODUCTION 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Age and Gender Estimation . . . . . . . . . . . . . . . . . . 4 1.2.2 Facial Expression Recognition (FER) . . . . . . . . . . . . . 4 1.2.3 Capsule networks (CapsNet) . . . . . . . . . . . . . . . . . . 5 1.2.4 Semi-Supervised Learning. . . . . . . . . . . . . . . . . . . . 5 1.2.5 Multi-Task Learning. . . . . . . . . . . . . . . . . . . . . . . 6 1.2.6 Knowledge and data distillation. . . . . . . . . . . . . . . . . 6 1.2.7 Domain Adaptation. . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. GF-CapsNet: Using Gabor Jet and Capsule Networks for Face-Related Tasks 10 2.1 Feeding CNN with Hand-Crafted Features . . . . . . . . . . . . . . . 10 2.1.1 Preparation of Input . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Age and Gender Estimation using the Gabor Responses . . . . 13 2.2 GF-CapsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Modification of CapsNet . . . . . . . . . . . . . . . . . 16 3. Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasks 20 3.1 MTL learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4. Experiments and Results 25 4.1 Experiments on GF-CNN and GF-CapsNet . . . . . . . . . . . . . . 25 4.2 GF-CNN Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 GF-CapsNet Results . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Experiment on Distill-2MD-MTL . . . . . . . . . . . . . . . . . . . 33 4.3.1 Semi-Supervised MTL . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Cross Datasets Cross-Domain Evaluation . . . . . . . . . . . 36 5. Conclusion 38 Abstract (In Korean) 49Maste
    • โ€ฆ
    corecore