8,054 research outputs found
FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces
With recent advances in computer vision and graphics, it is now possible to
generate videos with extremely realistic synthetic faces, even in real time.
Countless applications are possible, some of which raise a legitimate alarm,
calling for reliable detectors of fake videos. In fact, distinguishing between
original and manipulated video can be a challenge for humans and computers
alike, especially when the videos are compressed or have low resolution, as it
often happens on social networks. Research on the detection of face
manipulations has been seriously hampered by the lack of adequate datasets. To
this end, we introduce a novel face manipulation dataset of about half a
million edited images (from over 1000 videos). The manipulations have been
generated with a state-of-the-art face editing approach. It exceeds all
existing video manipulation datasets by at least an order of magnitude. Using
our new dataset, we introduce benchmarks for classical image forensic tasks,
including classification and segmentation, considering videos compressed at
various quality levels. In addition, we introduce a benchmark evaluation for
creating indistinguishable forgeries with known ground truth; for instance with
generative refinement models.Comment: Video: https://youtu.be/Tle7YaPkO_
Learning based Facial Image Compression with Semantic Fidelity Metric
Surveillance and security scenarios usually require high efficient facial
image compression scheme for face recognition and identification. While either
traditional general image codecs or special facial image compression schemes
only heuristically refine codec separately according to face verification
accuracy metric. We propose a Learning based Facial Image Compression (LFIC)
framework with a novel Regionally Adaptive Pooling (RAP) module whose
parameters can be automatically optimized according to gradient feedback from
an integrated hybrid semantic fidelity metric, including a successfully
exploration to apply Generative Adversarial Network (GAN) as metric directly in
image compression scheme. The experimental results verify the framework's
efficiency by demonstrating performance improvement of 71.41%, 48.28% and
52.67% bitrate saving separately over JPEG2000, WebP and neural network-based
codecs under the same face verification accuracy distortion metric. We also
evaluate LFIC's superior performance gain compared with latest specific facial
image codecs. Visual experiments also show some interesting insight on how LFIC
can automatically capture the information in critical areas based on semantic
distortion metrics for optimized compression, which is quite different from the
heuristic way of optimization in traditional image compression algorithms.Comment: Accepted by Neurocomputin
Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach
Emotion recognition from facial expressions is tremendously useful,
especially when coupled with smart devices and wireless multimedia
applications. However, the inadequate network bandwidth often limits the
spatial resolution of the transmitted video, which will heavily degrade the
recognition reliability. We develop a novel framework to achieve robust emotion
recognition from low bit rate video. While video frames are downsampled at the
encoder side, the decoder is embedded with a deep network model for joint
super-resolution (SR) and recognition. Notably, we propose a novel max-mix
training strategy, leading to a single "One-for-All" model that is remarkably
robust to a vast range of downsampling factors. That makes our framework well
adapted for the varied bandwidths in real transmission scenarios, without
hampering scalability or efficiency. The proposed framework is evaluated on the
AVEC 2016 benchmark, and demonstrates significantly improved stand-alone
recognition performance, as well as rate-distortion (R-D) performance, than
either directly recognizing from LR frames, or separating SR and recognition.Comment: Accepted by the Seventh International Conference on Affective
Computing and Intelligent Interaction (ACII2017
Face Recognition in Low Quality Images: A Survey
Low-resolution face recognition (LRFR) has received increasing attention over
the past few years. Its applications lie widely in the real-world environment
when high-resolution or high-quality images are hard to capture. One of the
biggest demands for LRFR technologies is video surveillance. As the the number
of surveillance cameras in the city increases, the videos that captured will
need to be processed automatically. However, those videos or images are usually
captured with large standoffs, arbitrary illumination condition, and diverse
angles of view. Faces in these images are generally small in size. Several
studies addressed this problem employed techniques like super resolution,
deblurring, or learning a relationship between different resolution domains. In
this paper, we provide a comprehensive review of approaches to low-resolution
face recognition in the past five years. First, a general problem definition is
given. Later, systematically analysis of the works on this topic is presented
by catogory. In addition to describing the methods, we also focus on datasets
and experiment settings. We further address the related works on unconstrained
low-resolution face recognition and compare them with the result that use
synthetic low-resolution data. Finally, we summarized the general limitations
and speculate a priorities for the future effort.Comment: There are some mistakes addressing in this paper which will be
misleading to the reader and we wont have a new version in short time. We
will resubmit once it is being corecte
ADS-ME: Anomaly Detection System for Micro-expression Spotting
Micro-expressions (MEs) are infrequent and uncontrollable facial events that
can highlight emotional deception and appear in a high-stakes environment. This
paper propose an algorithm for spatiotemporal MEs spotting. Since MEs are
unusual events, we treat them as abnormal patterns that diverge from expected
Normal Facial Behaviour (NFBs) patterns. NFBs correspond to facial muscle
activations, eye blink/gaze events and mouth opening/closing movements that are
all facial deformation but not MEs. We propose a probabilistic model to
estimate the probability density function that models the spatiotemporal
distributions of NFBs patterns. To rank the outputs, we compute the negative
log-likelihood and we developed an adaptive thresholding technique to identify
MEs from NFBs. While working only with NFBs data, the main challenge is to
capture intrinsic spatiotemoral features, hence we design a recurrent
convolutional autoencoder for feature representation. Finally, we show that our
system is superior to previous works for MEs spotting.Comment: 35 pages, 9 figures, 3 table
Towards improved lossy image compression: Human image reconstruction with public-domain images
Lossy image compression has been studied extensively in the context of
typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low
bitrates generally produces unsatisfying results. Furthermore, the availability
of massive public image datasets appears to have hardly been exploited in image
compression. Here, we present a paradigm for eliciting human image
reconstruction in order to perform lossy image compression. In this paradigm,
one human describes images to a second human, whose task is to reconstruct the
target image using publicly available images and text instructions. The
resulting reconstructions are then evaluated by human raters on the Amazon
Mechanical Turk platform and compared to reconstructions obtained using
state-of-the-art compressor WebP. Our results suggest that prioritizing
semantic visual elements may be key to achieving significant improvements in
image compression, and that our paradigm can be used to develop a more
human-centric loss function.
The images, results and additional data are available at
https://compression.stanford.edu/human-compressio
Recurrent Regression for Face Recognition
To address the sequential changes of images including poses, in this paper we
propose a recurrent regression neural network(RRNN) framework to unify two
classic tasks of cross-pose face recognition on still images and video-based
face recognition. To imitate the changes of images, we explicitly construct the
potential dependencies of sequential images so as to regularize the final
learning model. By performing progressive transforms for sequentially adjacent
images, RRNN can adaptively memorize and forget the information that benefits
for the final classification. For face recognition of still images, given any
one image with any one pose, we recurrently predict the images with its
sequential poses to expect to capture some useful information of others poses.
For video-based face recognition, the recurrent regression takes one entire
sequence rather than one image as its input. We verify RRNN in static face
dataset MultiPIE and face video dataset YouTube Celebrities(YTC). The
comprehensive experimental results demonstrate the effectiveness of the
proposed RRNN method
Improving Heterogeneous Face Recognition with Conditional Adversarial Networks
Heterogeneous face recognition between color image and depth image is a much
desired capacity for real world applications where shape information is looked
upon as merely involved in gallery. In this paper, we propose a cross-modal
deep learning method as an effective and efficient workaround for this
challenge. Specifically, we begin with learning two convolutional neural
networks (CNNs) to extract 2D and 2.5D face features individually. Once
trained, they can serve as pre-trained models for another two-way CNN which
explores the correlated part between color and depth for heterogeneous
matching. Compared with most conventional cross-modal approaches, our method
additionally conducts accurate depth image reconstruction from single color
image with Conditional Generative Adversarial Nets (cGAN), and further enhances
the recognition performance by fusing multi-modal matching results. Through
both qualitative and quantitative experiments on benchmark FRGC 2D/3D face
database, we demonstrate that the proposed pipeline outperforms
state-of-the-art performance on heterogeneous face recognition and ensures a
drastically efficient on-line stage
Discriminative Local Sparse Representations for Robust Face Recognition
A key recent advance in face recognition models a test face image as a sparse
linear combination of a set of training face images. The resulting sparse
representations have been shown to possess robustness against a variety of
distortions like random pixel corruption, occlusion and disguise. This approach
however makes the restrictive (in many scenarios) assumption that test faces
must be perfectly aligned (or registered) to the training data prior to
classification. In this paper, we propose a simple yet robust local block-based
sparsity model, using adaptively-constructed dictionaries from local features
in the training data, to overcome this misalignment problem. Our approach is
inspired by human perception: we analyze a series of local discriminative
features and combine them to arrive at the final classification decision. We
propose a probabilistic graphical model framework to explicitly mine the
conditional dependencies between these distinct sparse local features. In
particular, we learn discriminative graphs on sparse representations obtained
from distinct local slices of a face. Conditional correlations between these
sparse features are first discovered (in the training phase), and subsequently
exploited to bring about significant improvements in recognition rates.
Experimental results obtained on benchmark face databases demonstrate the
effectiveness of the proposed algorithms in the presence of multiple
registration errors (such as translation, rotation, and scaling) as well as
under variations of pose and illumination
Smart, Sparse Contours to Represent and Edit Images
We study the problem of reconstructing an image from information stored at
contour locations. We show that high-quality reconstructions with high fidelity
to the source image can be obtained from sparse input, e.g., comprising less
than of image pixels. This is a significant improvement over existing
contour-based reconstruction methods that require much denser input to capture
subtle texture information and to ensure image quality. Our model, based on
generative adversarial networks, synthesizes texture and details in regions
where no input information is provided. The semantic knowledge encoded into our
model and the sparsity of the input allows to use contours as an intuitive
interface for semantically-aware image manipulation: local edits in contour
domain translate to long-range and coherent changes in pixel space. We can
perform complex structural changes such as changing facial expression by simple
edits of contours. Our experiments demonstrate that humans as well as a face
recognition system mostly cannot distinguish between our reconstructions and
the source images.Comment: Accepted to CVPR'18; Project page: contour2im.github.i
- …