663 research outputs found
CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression
Lossy image compression algorithms are pervasively used to reduce the size of
images transmitted over the web and recorded on data storage media. However, we
pay for their high compression rate with visual artifacts degrading the user
experience. Deep convolutional neural networks have become a widespread tool to
address high-level computer vision tasks very successfully. Recently, they have
found their way into the areas of low-level computer vision and image
processing to solve regression problems mostly with relatively shallow
networks.
We present a novel 12-layer deep convolutional network for image compression
artifact suppression with hierarchical skip connections and a multi-scale loss
function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an
improvement of up to 0.36 dB over the best previous ConvNet result. We show
that a network trained for a specific quality factor (QF) is resilient to the
QF used to compress the input image - a single network trained for QF 60
provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.Comment: 8 page
Learning non-maximum suppression
Object detectors have hugely profited from moving towards an end-to-end
learning paradigm: proposals, features, and the classifier becoming one neural
network improved results two-fold on general object detection. One
indispensable component is non-maximum suppression (NMS), a post-processing
algorithm responsible for merging all detections that belong to the same
object. The de facto standard NMS algorithm is still fully hand-crafted,
suspiciously simple, and -- being based on greedy clustering with a fixed
distance threshold -- forces a trade-off between recall and precision. We
propose a new network architecture designed to perform NMS, using only boxes
and their score. We report experiments for person detection on PETS and for
general object categories on the COCO dataset. Our approach shows promise
providing improved localization and occlusion handling.Comment: Added "Supplementary material" titl
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning
This paper studies the joint learning of action recognition and temporal
localization in long, untrimmed videos. We employ a multi-task learning
framework that performs the three highly related steps of action proposal,
action recognition, and action localization refinement in parallel instead of
the standard sequential pipeline that performs the steps in order. We develop a
novel temporal actionness regression module that estimates what proportion of a
clip contains action. We use it for temporal localization but it could have
other applications like video retrieval, surveillance, summarization, etc. We
also introduce random shear augmentation during training to simulate viewpoint
change. We evaluate our framework on three popular video benchmarks. Results
demonstrate that our joint model is efficient in terms of storage and
computation in that we do not need to compute and cache dense trajectory
features, and that it is several times faster than its sequential ConvNets
counterpart. Yet, despite being more efficient, it outperforms state-of-the-art
methods with respect to accuracy.Comment: WACV 2017 camera ready, minor updates about test time efficienc
Unsupervised Network Pretraining via Encoding Human Design
Over the years, computer vision researchers have spent an immense amount of
effort on designing image features for the visual object recognition task. We
propose to incorporate this valuable experience to guide the task of training
deep neural networks. Our idea is to pretrain the network through the task of
replicating the process of hand-designed feature extraction. By learning to
replicate the process, the neural network integrates previous research
knowledge and learns to model visual objects in a way similar to the
hand-designed features. In the succeeding finetuning step, it further learns
object-specific representations from labeled data and this boosts its
classification power. We pretrain two convolutional neural networks where one
replicates the process of histogram of oriented gradients feature extraction,
and the other replicates the process of region covariance feature extraction.
After finetuning, we achieve substantially better performance than the baseline
methods.Comment: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of
Computer Visio
- …