37 research outputs found
Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections
In this paper, we propose a very deep fully convolutional encoding-decoding
framework for image restoration such as denoising and super-resolution. The
network is composed of multiple layers of convolution and de-convolution
operators, learning end-to-end mappings from corrupted images to the original
ones. The convolutional layers act as the feature extractor, which capture the
abstraction of image contents while eliminating noises/corruptions.
De-convolutional layers are then used to recover the image details. We propose
to symmetrically link convolutional and de-convolutional layers with skip-layer
connections, with which the training converges much faster and attains a
higher-quality local optimum. First, The skip connections allow the signal to
be back-propagated to bottom layers directly, and thus tackles the problem of
gradient vanishing, making training deep networks easier and achieving
restoration performance gains consequently. Second, these skip connections pass
image details from convolutional layers to de-convolutional layers, which is
beneficial in recovering the original image. Significantly, with the large
capacity, we can handle different levels of noises using a single model.
Experimental results show that our network achieves better performance than all
previously reported state-of-the-art methods.Comment: Accepted to Proc. Advances in Neural Information Processing Systems
(NIPS'16). Content of the final version may be slightly different. Extended
version is available at http://arxiv.org/abs/1606.0892
Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks
Recently, skeleton based action recognition gains more popularity due to
cost-effective depth sensors coupled with real-time skeleton estimation
algorithms. Traditional approaches based on handcrafted features are limited to
represent the complexity of motion patterns. Recent methods that use Recurrent
Neural Networks (RNN) to handle raw skeletons only focus on the contextual
dependency in the temporal domain and neglect the spatial configurations of
articulated skeletons. In this paper, we propose a novel two-stream RNN
architecture to model both temporal dynamics and spatial configurations for
skeleton based action recognition. We explore two different structures for the
temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
according to human body kinematics. We also propose two effective methods to
model the spatial structure by converting the spatial graph into a sequence of
joints. To improve generalization of our model, we further exploit 3D
transformation based data augmentation techniques including rotation and
scaling transformation to transform the 3D coordinates of skeletons during
training. Experiments on 3D action recognition benchmark datasets show that our
method brings a considerable improvement for a variety of actions, i.e.,
generic actions, interaction activities and gestures.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Multi-Frame Quality Enhancement for Compressed Video
The past few years have witnessed great success in applying deep learning to
enhance the quality of compressed image/video. The existing approaches mainly
focus on enhancing the quality of a single frame, ignoring the similarity
between consecutive frames. In this paper, we investigate that heavy quality
fluctuation exists across compressed video frames, and thus low quality frames
can be enhanced using the neighboring high quality frames, seen as Multi-Frame
Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach
for compressed video, as a first attempt in this direction. In our approach, we
firstly develop a Support Vector Machine (SVM) based detector to locate Peak
Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame
Convolutional Neural Network (MF-CNN) is designed to enhance the quality of
compressed video, in which the non-PQF and its nearest two PQFs are as the
input. The MF-CNN compensates motion between the non-PQF and PQFs through the
Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement
subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help
of its nearest PQFs. Finally, the experiments validate the effectiveness and
generality of our MFQE approach in advancing the state-of-the-art quality
enhancement of compressed video. The code of our MFQE approach is available at
https://github.com/ryangBUAA/MFQE.gitComment: to appear in CVPR 201