18,610 research outputs found
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
Discriminative Sparse Coding on Multi-Manifold for Data Representation and Classification
Sparse coding has been popularly used as an effective data representation
method in various applications, such as computer vision, medical imaging and
bioinformatics, etc. However, the conventional sparse coding algorithms and its
manifold regularized variants (graph sparse coding and Laplacian sparse
coding), learn the codebook and codes in a unsupervised manner and neglect the
class information available in the training set. To address this problem, in
this paper we propose a novel discriminative sparse coding method based on
multi-manifold, by learning discriminative class-conditional codebooks and
sparse codes from both data feature space and class labels. First, the entire
training set is partitioned into multiple manifolds according to the class
labels. Then, we formulate the sparse coding as a manifold-manifold matching
problem and learn class-conditional codebooks and codes to maximize the
manifold margins of different classes. Lastly, we present a data point-manifold
matching error based strategy to classify the unlabeled data point.
Experimental results on somatic mutations identification and breast tumors
classification in ultrasonic images tasks demonstrate the efficacy of the
proposed data representation-classification approach.Comment: This paper has been withdrawn by the author due to the terrible
writin
Multibiometric: Feature Level Fusion Using FKP Multi-Instance biometric
This paper proposed the use of multi-instance feature level fusion as a means
to improve the performance of Finger Knuckle Print (FKP) verification. A
log-Gabor filter has been used to extract the image local orientation
information, and represent the FKP features. Experiments are performed using
the FKP database, which consists of 7,920 images. Results indicate that the
multi-instance verification approach outperforms higher performance than using
any single instance. The influence on biometric performance using feature level
fusion under different fusion rules have been demonstrated in this paper.Comment: 8 pages pape
When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning and Coding Network for Image Recognition with Limited Data
We present a new Deep Dictionary Learning and Coding Network (DDLCN) for
image recognition tasks with limited data. The proposed DDLCN has most of the
standard deep learning layers (e.g., input/output, pooling, fully connected,
etc.), but the fundamental convolutional layers are replaced by our proposed
compound dictionary learning and coding layers. The dictionary learning learns
an over-complete dictionary for input training data. At the deep coding layer,
a locality constraint is added to guarantee that the activated dictionary bases
are close to each other. Then the activated dictionary atoms are assembled and
passed to the compound dictionary learning and coding layers. In this way, the
activated atoms in the first layer can be represented by the deeper atoms in
the second dictionary. Intuitively, the second dictionary is designed to learn
the fine-grained components shared among the input dictionary atoms, thus a
more informative and discriminative low-level representation of the dictionary
atoms can be obtained. We empirically compare DDLCN with several leading
dictionary learning methods and deep learning models. Experimental results on
five popular datasets show that DDLCN achieves competitive results compared
with state-of-the-art methods when the training data is limited. Code is
available at https://github.com/Ha0Tang/DDLCN.Comment: Accepted to TNNLS, an extended version of a paper published in
WACV2019. arXiv admin note: substantial text overlap with arXiv:1809.0418
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Selective Image Super-Resolution
In this paper we propose a vision system that performs image Super Resolution
(SR) with selectivity. Conventional SR techniques, either by multi-image fusion
or example-based construction, have failed to capitalize on the intrinsic
structural and semantic context in the image, and performed "blind" resolution
recovery to the entire image area. By comparison, we advocate example-based
selective SR whereby selectivity is exemplified in three aspects: region
selectivity (SR only at object regions), source selectivity (object SR with
trained object dictionaries), and refinement selectivity (object boundaries
refinement using matting). The proposed system takes over-segmented
low-resolution images as inputs, assimilates recent learning techniques of
sparse coding (SC) and grouped multi-task lasso (GMTL), and leads eventually to
a framework for joint figure-ground separation and interest object SR. The
efficiency of our framework is manifested in our experiments with subsets of
the VOC2009 and MSRC datasets. We also demonstrate several interesting vision
applications that can build on our system.Comment: 20 pages, 5 figures. Submitted to Computer Vision and Image
Understanding in March 2010. Keywords: image super resolution, semantic image
segmentation, vision system, vision applicatio
Recurrent Convolutional Neural Network Regression for Continuous Pain Intensity Estimation in Video
Automatic pain intensity estimation possesses a significant position in
healthcare and medical field. Traditional static methods prefer to extract
features from frames separately in a video, which would result in unstable
changes and peaks among adjacent frames. To overcome this problem, we propose a
real-time regression framework based on the recurrent convolutional neural
network for automatic frame-level pain intensity estimation. Given vector
sequences of AAM-warped facial images, we used a sliding-window strategy to
obtain fixed-length input samples for the recurrent network. We then carefully
design the architecture of the recurrent network to output continuous-valued
pain intensity. The proposed end-to-end pain intensity regression framework can
predict the pain intensity of each frame by considering a sufficiently large
historical frames while limiting the scale of the parameters within the model.
Our method achieves promising results regarding both accuracy and running speed
on the published UNBC-McMaster Shoulder Pain Expression Archive Database.Comment: This paper is the pre-print technical report of the paper accepted by
the IEEE CVPR Workshop of Affect "in-the-wild". The final version will be
available after the worksho
Modality Dropout for Improved Performance-driven Talking Faces
We describe our novel deep learning approach for driving animated faces using
both acoustic and visual information. In particular, speech-related facial
movements are generated using audiovisual information, and non-speech facial
movements are generated using only visual information. To ensure that our model
exploits both modalities during training, batches are generated that contain
audio-only, video-only, and audiovisual input features. The probability of
dropping a modality allows control over the degree to which the model exploits
audio and visual information during training. Our trained model runs in
real-time on resource limited hardware (e.g.\ a smart phone), it is user
agnostic, and it is not dependent on a potentially error-prone transcription of
the speech. We use subjective testing to demonstrate: 1) the improvement of
audiovisual-driven animation over the equivalent video-only approach, and 2)
the improvement in the animation of speech-related facial movements after
introducing modality dropout. Before introducing dropout, viewers prefer
audiovisual-driven animation in 51% of the test sequences compared with only
18% for video-driven. After introducing dropout viewer preference for
audiovisual-driven animation increases to 74%, but decreases to 8% for
video-only.Comment: Pre-prin
Discriminative Representation Combinations for Accurate Face Spoofing Detection
Three discriminative representations for face presentation attack detection
are introduced in this paper. Firstly we design a descriptor called spatial
pyramid coding micro-texture (SPMT) feature to characterize local appearance
information. Secondly we utilize the SSD, which is a deep learning framework
for detection, to excavate context cues and conduct end-to-end face
presentation attack detection. Finally we design a descriptor called template
face matched binocular depth (TFBD) feature to characterize stereo structures
of real and fake faces. For accurate presentation attack detection, we also
design two kinds of representation combinations. Firstly, we propose a
decision-level cascade strategy to combine SPMT with SSD. Secondly, we use a
simple score fusion strategy to combine face structure cues (TFBD) with local
micro-texture features (SPMT). To demonstrate the effectiveness of our design,
we evaluate the representation combination of SPMT and SSD on three public
datasets, which outperforms all other state-of-the-art methods. In addition, we
evaluate the representation combination of SPMT and TFBD on our dataset and
excellent performance is also achieved.Comment: To be published in Pattern Recognitio
- …