57 research outputs found
Deep Graph Laplacian Regularization for Robust Denoising of Real Images
Recent developments in deep learning have revolutionized the paradigm of
image restoration. However, its applications on real image denoising are still
limited, due to its sensitivity to training data and the complex nature of real
image noise. In this work, we combine the robustness merit of model-based
approaches and the learning power of data-driven approaches for real image
denoising. Specifically, by integrating graph Laplacian regularization as a
trainable module into a deep learning framework, we are less susceptible to
overfitting than pure CNN-based approaches, achieving higher robustness to
small datasets and cross-domain denoising. First, a sparse neighborhood graph
is built from the output of a convolutional neural network (CNN). Then the
image is restored by solving an unconstrained quadratic programming problem,
using a corresponding graph Laplacian regularizer as a prior term. The proposed
restoration pipeline is fully differentiable and hence can be end-to-end
trained. Experimental results demonstrate that our work is less prone to
overfitting given small training data. It is also endowed with strong
cross-domain generalization power, outperforming the state-of-the-art
approaches by a remarkable margin
Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching
Leveraging on the recent developments in convolutional neural networks
(CNNs), matching dense correspondence from a stereo pair has been cast as a
learning problem, with performance exceeding traditional approaches. However,
it remains challenging to generate high-quality disparities for the inherently
ill-posed regions. To tackle this problem, we propose a novel cascade CNN
architecture composing of two stages. The first stage advances the recently
proposed DispNet by equipping it with extra up-convolution modules, leading to
disparity images with more details. The second stage explicitly rectifies the
disparity initialized by the first stage; it couples with the first-stage and
generates residual signals across multiple scales. The summation of the outputs
from the two stages gives the final disparity. As opposed to directly learning
the disparity at the second stage, we show that residual learning provides more
effective refinement. Moreover, it also benefits the training of the overall
cascade network. Experimentation shows that our cascade residual learning
scheme provides state-of-the-art performance for matching stereo
correspondence. By the time of the submission of this paper, our method ranks
first in the KITTI 2015 stereo benchmark, surpassing the prior works by a
noteworthy margin.Comment: Accepted at ICCVW 2017. The first two authors contributed equally to
this pape
LSTM Pose Machines
We observed that recent state-of-the-art results on single image human pose
estimation were achieved by multi-stage Convolution Neural Networks (CNN).
Notwithstanding the superior performance on static images, the application of
these models on videos is not only computationally intensive, it also suffers
from performance degeneration and flicking. Such suboptimal results are mainly
attributed to the inability of imposing sequential geometric consistency,
handling severe image quality degradation (e.g. motion blur and occlusion) as
well as the inability of capturing the temporal correlation among video frames.
In this paper, we proposed a novel recurrent network to tackle these problems.
We showed that if we were to impose the weight sharing scheme to the
multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN).
This property decouples the relationship among multiple network stages and
results in significantly faster speed in invoking the network for videos. It
also enables the adoption of Long Short-Term Memory (LSTM) units between video
frames. We found such memory augmented RNN is very effective in imposing
geometric consistency among frames. It also well handles input quality
degradation in videos while successfully stabilizes the sequential outputs. The
experiments showed that our approach significantly outperformed current
state-of-the-art methods on two large-scale video pose estimation benchmarks.
We also explored the memory cells inside the LSTM and provided insights on why
such mechanism would benefit the prediction for video-based pose estimations.Comment: Poster in IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 201
Accurate Single Stage Detector Using Recurrent Rolling Convolution
Most of the recent successful methods in accurate object detection and
localization used some variants of R-CNN style two stage Convolutional Neural
Networks (CNN) where plausible regions were proposed in the first stage then
followed by a second stage for decision refinement. Despite the simplicity of
training and the efficiency in deployment, the single stage detection methods
have not been as competitive when evaluated in benchmarks consider mAP for high
IoU thresholds. In this paper, we proposed a novel single stage end-to-end
trainable object detection network to overcome this limitation. We achieved
this by introducing Recurrent Rolling Convolution (RRC) architecture over
multi-scale feature maps to construct object classifiers and bounding box
regressors which are "deep in context". We evaluated our method in the
challenging KITTI dataset which measures methods under IoU threshold of 0.7. We
showed that with RRC, a single reduced VGG-16 based model already significantly
outperformed all the previously published results. At the time this paper was
written our models ranked the first in KITTI car detection (the hard level),
the first in cyclist detection and the second in pedestrian detection. These
results were not reached by the previous single stage methods. The code is
publicly available.Comment: CVPR 201
- …