509 research outputs found
Visual Saliency Based on Multiscale Deep Features
Visual saliency is a fundamental problem in both cognitive and computational
sciences, including computer vision. In this CVPR 2015 paper, we discover that
a high-quality visual saliency model can be trained with multiscale features
extracted using a popular deep learning architecture, convolutional neural
networks (CNNs), which have had many successes in visual recognition tasks. For
learning such saliency models, we introduce a neural network architecture,
which has fully connected layers on top of CNNs responsible for extracting
features at three different scales. We then propose a refinement method to
enhance the spatial coherence of our saliency results. Finally, aggregating
multiple saliency maps computed for different levels of image segmentation can
further boost the performance, yielding saliency maps better than those
generated from a single segmentation. To promote further research and
evaluation of visual saliency models, we also construct a new large database of
4447 challenging images and their pixelwise saliency annotation. Experimental
results demonstrate that our proposed method is capable of achieving
state-of-the-art performance on all public benchmarks, improving the F-Measure
by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset
(HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively
on these two datasets.Comment: To appear in CVPR 201
Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks
Protein secondary structure prediction is an important problem in
bioinformatics. Inspired by the recent successes of deep neural networks, in
this paper, we propose an end-to-end deep network that predicts protein
secondary structures from integrated local and global contextual features. Our
deep architecture leverages convolutional neural networks with different kernel
sizes to extract multiscale local contextual features. In addition, considering
long-range dependencies existing in amino acid sequences, we set up a
bidirectional neural network consisting of gated recurrent unit to capture
global contextual features. Furthermore, multi-task learning is utilized to
predict secondary structure labels and amino-acid solvent accessibility
simultaneously. Our proposed deep network demonstrates its effectiveness by
achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public
benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11.
Our model and results are publicly available.Comment: 8 pages, 3 figures, Accepted by International Joint Conferences on
Artificial Intelligence (IJCAI
Deep Contrast Learning for Salient Object Detection
Salient object detection has recently witnessed substantial progress due to
powerful features extracted using deep convolutional neural networks (CNNs).
However, existing CNN-based methods operate at the patch level instead of the
pixel level. Resulting saliency maps are typically blurry, especially near the
boundary of salient objects. Furthermore, image patches are treated as
independent samples even when they are overlapping, giving rise to significant
redundancy in computation and storage. In this CVPR 2016 paper, we propose an
end-to-end deep contrast network to overcome the aforementioned limitations.
Our deep network consists of two complementary components, a pixel-level fully
convolutional stream and a segment-wise spatial pooling stream. The first
stream directly produces a saliency map with pixel-level accuracy from an input
image. The second stream extracts segment-wise features very efficiently, and
better models saliency discontinuities along object boundaries. Finally, a
fully connected CRF model can be optionally incorporated to improve spatial
coherence and contour localization in the fused result from these two streams.
Experimental results demonstrate that our deep model significantly improves the
state of the art.Comment: To appear in CVPR 201
Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% - 10% using a single model.Comment: To appear in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017
Instance-Level Salient Object Segmentation
Image saliency detection has recently witnessed rapid progress due to deep
convolutional neural networks. However, none of the existing methods is able to
identify object instances in the detected salient regions. In this paper, we
present a salient instance segmentation method that produces a saliency mask
with distinct object instance labels for an input image. Our method consists of
three steps, estimating saliency map, detecting salient object contours and
identifying salient object instances. For the first two steps, we propose a
multiscale saliency refinement network, which generates high-quality salient
region masks and salient object contours. Once integrated with multiscale
combinatorial grouping and a MAP-based subset optimization framework, our
method can generate very promising salient object instance segmentation
results. To promote further research and evaluation of salient instance
segmentation, we also construct a new database of 1000 images and their
pixelwise salient instance annotations. Experimental results demonstrate that
our proposed method is capable of achieving state-of-the-art performance on all
public benchmarks for salient region detection as well as on our new dataset
for salient instance segmentation.Comment: To appear in CVPR201
Speaker-following Video Subtitles
We propose a new method for improving the presentation of subtitles in video
(e.g. TV and movies). With conventional subtitles, the viewer has to constantly
look away from the main viewing area to read the subtitles at the bottom of the
screen, which disrupts the viewing experience and causes unnecessary eyestrain.
Our method places on-screen subtitles next to the respective speakers to allow
the viewer to follow the visual content while simultaneously reading the
subtitles. We use novel identification algorithms to detect the speakers based
on audio and visual information. Then the placement of the subtitles is
determined using global optimization. A comprehensive usability study indicated
that our subtitle placement method outperformed both conventional
fixed-position subtitling and another previous dynamic subtitling method in
terms of enhancing the overall viewing experience and reducing eyestrain
- …