31 research outputs found

    Learning Grimaces by Watching TV

    Full text link
    Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at http://www.robots.ox.ac.uk/~vgg/data/facevalueComment: British Machine Vision Conference (BMVC) 201

    Cross-stitch Networks for Multi-task Learning

    Full text link
    Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we propose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.Comment: To appear in CVPR 2016 (Spotlight

    Enhanced Face Detection Based on Haar-Like and MB-LBP Features

    Get PDF
    The effective real-time face detection framework proposed by Viola and Jones gained much popularity due its computational efficiency and its simplicity. A notable variant replaces the original Haar-like features with MB-LBP (Multi-Block Local Binary Pattern) which are defined by the local binary pattern operator, both detector types are integrated into the OpenCV library. However, each descriptor and its evaluation method has its own set of strengths and setbacks. In this paper, an enhanced two-layer face detector composed of both Haar-like and MB-LBP features is presented. Haar-like features are employed as a coarse filter but with a new evaluation involving dual threshold. The already established MB-LBPs are arranged as the fine filter of the detector. The Gentle AdaBoost learning algorithm is deployed for the training of the proposed detector to reach the classification and performance potential. Experiments show that in the early stages of classification, Haar features with dual threshold are more discriminative than MB-LBP and original Haar-like features with respect to number of features required and computation. Benchmarking the proposed detector demonstrate overall 12% higher detection rate at 17% false alarm over using MB-LBP features singly while performing with ×3 speedup

    MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes

    Full text link
    Attribute recognition, particularly facial, extracts many labels for each image. While some multi-task vision problems can be decomposed into separate tasks and stages, e.g., training independent models for each task, for a growing set of problems joint optimization across all tasks has been shown to improve performance. We show that for deep convolutional neural network (DCNN) facial attribute extraction, multi-task optimization is better. Unfortunately, it can be difficult to apply joint optimization to DCNNs when training data is imbalanced, and re-balancing multi-label data directly is structurally infeasible, since adding/removing data to balance one label will change the sampling of the other labels. This paper addresses the multi-label imbalance problem by introducing a novel mixed objective optimization network (MOON) with a loss function that mixes multiple task objectives with domain adaptive re-weighting of propagated loss. Experiments demonstrate that not only does MOON advance the state of the art in facial attribute recognition, but it also outperforms independently trained DCNNs using the same data. When using facial attributes for the LFW face recognition task, we show that our balanced (domain adapted) network outperforms the unbalanced trained network.Comment: Post-print of manuscript accepted to the European Conference on Computer Vision (ECCV) 2016 http://link.springer.com/chapter/10.1007%2F978-3-319-46454-1_

    J Acoust Soc Am

    Get PDF
    Occupational and recreational acoustic noise exposure is known to cause permanent hearing damage and reduced quality of life, which indicates the importance of noise controls including hearing protection devices (HPDs) in situations where high noise levels exist. While HPDs can provide adequate protection for many noise exposures, it is often a challenge to properly train HPD users and maintain compliance with usage guidelines. HPD fit-testing systems are commercially available to ensure proper attenuation is achieved, but they often require specific facilities designed for hearing testing (e.g., a quiet room or an audiometric booth) or special equipment (e.g., modified HPDs designed specifically for fit testing). In this study, we explored using visual information from a photograph of an HPD inserted into the ear to estimate hearing protector attenuation. Our dataset consists of 960 unique photographs from four types of hearing protectors across 160 individuals. We achieved 73% classification accuracy in predicting if the fit was greater or less than the median measured attenuation (29\u2009dB at 1\u2009kHz) using a deep neural network. Ultimately, the fit-test technique developed in this research could be used for training as well as for automated compliance monitoring in noisy environments to prevent hearing loss.CC999999/ImCDC/Intramural CDC HHSUnited States/2021-12-21T00:00:00Z34470332PMC868936110722vault:4060
    corecore