31 research outputs found
Learning Grimaces by Watching TV
Differently from computer vision systems which require explicit supervision,
humans can learn facial expressions by observing people in their environment.
In this paper, we look at how similar capabilities could be developed in
machine vision. As a starting point, we consider the problem of relating facial
expressions to objectively measurable events occurring in videos. In
particular, we consider a gameshow in which contestants play to win significant
sums of money. We extract events affecting the game and corresponding facial
expressions objectively and automatically from the videos, obtaining large
quantities of labelled data for our study. We also develop, using benchmarks
such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial
expression recognition, showing that pre-training on face verification data can
be highly beneficial for this task. Then, we extend these models to use facial
expressions to predict events in videos and learn nameable expressions from
them. The dataset and emotion recognition models are available at
http://www.robots.ox.ac.uk/~vgg/data/facevalueComment: British Machine Vision Conference (BMVC) 201
Cross-stitch Networks for Multi-task Learning
Multi-task learning in Convolutional Networks has displayed remarkable
success in the field of recognition. This success can be largely attributed to
learning shared representations from multiple supervisory tasks. However,
existing multi-task approaches rely on enumerating multiple network
architectures specific to the tasks at hand, that do not generalize. In this
paper, we propose a principled approach to learn shared representations in
ConvNets using multi-task learning. Specifically, we propose a new sharing
unit: "cross-stitch" unit. These units combine the activations from multiple
networks and can be trained end-to-end. A network with cross-stitch units can
learn an optimal combination of shared and task-specific representations. Our
proposed method generalizes across multiple tasks and shows dramatically
improved performance over baseline methods for categories with few training
examples.Comment: To appear in CVPR 2016 (Spotlight
Enhanced Face Detection Based on Haar-Like and MB-LBP Features
The effective real-time face detection framework proposed by Viola and Jones gained much popularity due its computational efficiency and its simplicity. A notable variant replaces the original Haar-like features with MB-LBP (Multi-Block Local Binary Pattern) which are defined by the local binary pattern operator, both detector types are integrated into the OpenCV library. However, each descriptor and its evaluation method has its own set of strengths and setbacks. In this paper, an enhanced two-layer face detector composed of both Haar-like and MB-LBP features is presented. Haar-like features are employed as a coarse filter but with a new evaluation involving dual threshold. The already established MB-LBPs are arranged as the fine filter of the detector. The Gentle AdaBoost learning algorithm is deployed for the training of the proposed detector to reach the classification and performance potential. Experiments show that in the early stages of classification, Haar features with dual threshold are more discriminative than MB-LBP and original Haar-like features with respect to number of features required and computation. Benchmarking the proposed detector demonstrate overall 12% higher detection rate at 17% false alarm over using MB-LBP features singly while performing with ×3 speedup
MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes
Attribute recognition, particularly facial, extracts many labels for each
image. While some multi-task vision problems can be decomposed into separate
tasks and stages, e.g., training independent models for each task, for a
growing set of problems joint optimization across all tasks has been shown to
improve performance. We show that for deep convolutional neural network (DCNN)
facial attribute extraction, multi-task optimization is better. Unfortunately,
it can be difficult to apply joint optimization to DCNNs when training data is
imbalanced, and re-balancing multi-label data directly is structurally
infeasible, since adding/removing data to balance one label will change the
sampling of the other labels. This paper addresses the multi-label imbalance
problem by introducing a novel mixed objective optimization network (MOON) with
a loss function that mixes multiple task objectives with domain adaptive
re-weighting of propagated loss. Experiments demonstrate that not only does
MOON advance the state of the art in facial attribute recognition, but it also
outperforms independently trained DCNNs using the same data. When using facial
attributes for the LFW face recognition task, we show that our balanced (domain
adapted) network outperforms the unbalanced trained network.Comment: Post-print of manuscript accepted to the European Conference on
Computer Vision (ECCV) 2016
http://link.springer.com/chapter/10.1007%2F978-3-319-46454-1_
J Acoust Soc Am
Occupational and recreational acoustic noise exposure is known to cause permanent hearing damage and reduced quality of life, which indicates the importance of noise controls including hearing protection devices (HPDs) in situations where high noise levels exist. While HPDs can provide adequate protection for many noise exposures, it is often a challenge to properly train HPD users and maintain compliance with usage guidelines. HPD fit-testing systems are commercially available to ensure proper attenuation is achieved, but they often require specific facilities designed for hearing testing (e.g., a quiet room or an audiometric booth) or special equipment (e.g., modified HPDs designed specifically for fit testing). In this study, we explored using visual information from a photograph of an HPD inserted into the ear to estimate hearing protector attenuation. Our dataset consists of 960 unique photographs from four types of hearing protectors across 160 individuals. We achieved 73% classification accuracy in predicting if the fit was greater or less than the median measured attenuation (29\u2009dB at 1\u2009kHz) using a deep neural network. Ultimately, the fit-test technique developed in this research could be used for training as well as for automated compliance monitoring in noisy environments to prevent hearing loss.CC999999/ImCDC/Intramural CDC HHSUnited States/2021-12-21T00:00:00Z34470332PMC868936110722vault:4060