3,283 research outputs found
Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition
Recognizing facial action units (AUs) during spontaneous facial displays is a
challenging problem. Most recently, Convolutional Neural Networks (CNNs) have
shown promise for facial AU recognition, where predefined and fixed convolution
filter sizes are employed. In order to achieve the best performance, the
optimal filter size is often empirically found by conducting extensive
experimental validation. Such a training process suffers from expensive
training cost, especially as the network becomes deeper.
This paper proposes a novel Optimized Filter Size CNN (OFS-CNN), where the
filter sizes and weights of all convolutional layers are learned simultaneously
from the training data along with learning convolution filters. Specifically,
the filter size is defined as a continuous variable, which is optimized by
minimizing the training loss. Experimental results on two AU-coded spontaneous
databases have shown that the proposed OFS-CNN is capable of estimating optimal
filter size for varying image resolution and outperforms traditional CNNs with
the best filter size obtained by exhaustive search. The OFS-CNN also beats the
CNN using multiple filter sizes and more importantly, is much more efficient
during testing with the proposed forward-backward propagation algorithm
Semantic Image Segmentation via Deep Parsing Network
This paper addresses semantic image segmentation by incorporating rich
information into Markov Random Field (MRF), including high-order relations and
mixture of label contexts. Unlike previous works that optimized MRFs using
iterative algorithm, we solve MRF by proposing a Convolutional Neural Network
(CNN), namely Deep Parsing Network (DPN), which enables deterministic
end-to-end computation in a single forward pass. Specifically, DPN extends a
contemporary CNN architecture to model unary terms and additional layers are
carefully devised to approximate the mean field algorithm (MF) for pairwise
terms. It has several appealing properties. First, different from the recent
works that combined CNN and MRF, where many iterations of MF were required for
each training image during back-propagation, DPN is able to achieve high
performance by approximating one iteration of MF. Second, DPN represents
various types of pairwise terms, making many existing works as its special
cases. Third, DPN makes MF easier to be parallelized and speeded up in
Graphical Processing Unit (GPU). DPN is thoroughly evaluated on the PASCAL VOC
2012 dataset, where a single DPN model yields a new state-of-the-art
segmentation accuracy.Comment: To appear in International Conference on Computer Vision (ICCV) 201
- …