71 research outputs found
Pooling-Invariant Image Feature Learning
Unsupervised dictionary learning has been a key component in state-of-the-art
computer vision recognition architectures. While highly effective methods exist
for patch-based dictionary learning, these methods may learn redundant features
after the pooling stage in a given early vision architecture. In this paper, we
offer a novel dictionary learning scheme to efficiently take into account the
invariance of learned features after the spatial pooling stage. The algorithm
is built on simple clustering, and thus enjoys efficiency and scalability. We
discuss the underlying mechanism that justifies the use of clustering
algorithms, and empirically show that the algorithm finds better dictionaries
than patch-based methods with the same dictionary size
Deep Convolutional Ranking for Multilabel Image Annotation
Multilabel image annotation is one of the most important challenges in
computer vision with many real-world applications. While existing work usually
use conventional visual features for multilabel annotation, features based on
Deep Neural Networks have shown potential to significantly boost performance.
In this work, we propose to leverage the advantage of such features and analyze
key components that lead to better performances. Specifically, we show that a
significant performance gain could be obtained by combining convolutional
architectures with approximate top- ranking objectives, as thye naturally
fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset
outperforms the conventional visual features by about 10%, obtaining the best
reported performance in the literature
Category-Independent Object-Level Saliency Detection
It is known that purely low-level saliency cues such as frequency does not lead to a good salient object detection result, requiring high-level knowledge to be adopted for successful discovery of task-independent salient objects. In this paper, we propose an efficient way to combine such high-level saliency priors and low-level appearance mod-els. We obtain the high-level saliency prior with the object-ness algorithm to find potential object candidates without the need of category information, and then enforce the con-sistency among the salient regions using a Gaussian MRF with the weights scaled by diverse density that emphasizes the influence of potential foreground pixels. Our model ob-tains saliency maps that assign high scores for the whole salient object, and achieves state-of-the-art performance on benchmark datasets covering various foreground statistics. 1
DeepSketch2Face: A Deep Learning Based Sketching System for 3D Face and Caricature Modeling
Face modeling has been paid much attention in the field of visual computing.
There exist many scenarios, including cartoon characters, avatars for social
media, 3D face caricatures as well as face-related art and design, where
low-cost interactive face modeling is a popular approach especially among
amateur users. In this paper, we propose a deep learning based sketching system
for 3D face and caricature modeling. This system has a labor-efficient
sketching interface, that allows the user to draw freehand imprecise yet
expressive 2D lines representing the contours of facial features. A novel CNN
based deep regression network is designed for inferring 3D face models from 2D
sketches. Our network fuses both CNN and shape based features of the input
sketch, and has two independent branches of fully connected layers generating
independent subsets of coefficients for a bilinear face representation. Our
system also supports gesture based interactions for users to further manipulate
initial face models. Both user studies and numerical results indicate that our
sketching system can help users create face models quickly and effectively. A
significantly expanded face database with diverse identities, expressions and
levels of exaggeration is constructed to promote further research and
evaluation of face modeling techniques.Comment: 12 pages, 16 figures, to appear in SIGGRAPH 201
Going Deeper with Convolutions
We propose a deep convolutional neural network architecture codenamed
"Inception", which was responsible for setting the new state of the art for
classification and detection in the ImageNet Large-Scale Visual Recognition
Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the
improved utilization of the computing resources inside the network. This was
achieved by a carefully crafted design that allows for increasing the depth and
width of the network while keeping the computational budget constant. To
optimize quality, the architectural decisions were based on the Hebbian
principle and the intuition of multi-scale processing. One particular
incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22
layers deep network, the quality of which is assessed in the context of
classification and detection
- …