30,335 research outputs found
The Devil of Face Recognition is in the Noise
The growing scale of face recognition datasets empowers us to train strong
convolutional networks for face recognition. While a variety of architectures
and loss functions have been devised, we still have a limited understanding of
the source and consequence of label noise inherent in existing datasets. We
make the following contributions: 1) We contribute cleaned subsets of popular
face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new
large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets
and cleaned subsets, we profile and analyze label noise properties of MegaFace
and MS-Celeb-1M. We show that a few orders more samples are needed to achieve
the same accuracy yielded by a clean subset. 3) We study the association
between different types of noise, i.e., label flips and outliers, with the
accuracy of face recognition models. 4) We investigate ways to improve data
cleanliness, including a comprehensive user study on the influence of data
labeling strategies to annotation accuracy. The IMDb-Face dataset has been
released on https://github.com/fwang91/IMDb-Face.Comment: accepted to ECCV'1
Understanding How Image Quality Affects Deep Neural Networks
Image quality is an important practical challenge that is often overlooked in
the design of machine vision systems. Commonly, machine vision systems are
trained and tested on high quality image datasets, yet in practical
applications the input images can not be assumed to be of high quality.
Recently, deep neural networks have obtained state-of-the-art performance on
many machine vision tasks. In this paper we provide an evaluation of 4
state-of-the-art deep neural network models for image classification under
quality distortions. We consider five types of quality distortions: blur,
noise, contrast, JPEG, and JPEG2000 compression. We show that the existing
networks are susceptible to these quality distortions, particularly to blur and
noise. These results enable future work in developing deep neural networks that
are more invariant to quality distortions.Comment: Final version will appear in IEEE Xplore in the Proceedings of the
Conference on the Quality of Multimedia Experience (QoMEX), June 6-8, 201
A New Reading of \u27Ethan Brand\u27: The Failed Quest
Examines how the protagonist in Nathaniel Hawthorne\u27s short story `Ethan Brand\u27 searched for what is considered the `unpardonable sin\u27 of divorcing one\u27s head from one\u27s heart and oneself from humanity. Interpretation of symbolisms in the story; Analysis of the Puritan view of sin in relation to the story
FaceFilter: Audio-visual speech separation using still images
The objective of this paper is to separate a target speaker's speech from a
mixture of two speakers using a deep audio-visual speech separation network.
Unlike previous works that used lip movement on video clips or pre-enrolled
speaker information as an auxiliary conditional feature, we use a single face
image of the target speaker. In this task, the conditional feature is obtained
from facial appearance in cross-modal biometric task, where audio and visual
identity representations are shared in latent space. Learnt identities from
facial images enforce the network to isolate matched speakers and extract the
voices from mixed speech. It solves the permutation problem caused by swapped
channel outputs, frequently occurred in speech separation tasks. The proposed
method is far more practical than video-based speech separation since user
profile images are readily available on many platforms. Also, unlike
speaker-aware separation methods, it is applicable on separation with unseen
speakers who have never been enrolled before. We show strong qualitative and
quantitative results on challenging real-world examples.Comment: Under submission as a conference paper. Video examples:
https://youtu.be/ku9xoLh62
- …