56,922 research outputs found
Compensating for Large In-Plane Rotations in Natural Images
Rotation invariance has been studied in the computer vision community
primarily in the context of small in-plane rotations. This is usually achieved
by building invariant image features. However, the problem of achieving
invariance for large rotation angles remains largely unexplored. In this work,
we tackle this problem by directly compensating for large rotations, as opposed
to building invariant features. This is inspired by the neuro-scientific
concept of mental rotation, which humans use to compare pairs of rotated
objects. Our contributions here are three-fold. First, we train a Convolutional
Neural Network (CNN) to detect image rotations. We find that generic CNN
architectures are not suitable for this purpose. To this end, we introduce a
convolutional template layer, which learns representations for canonical
'unrotated' images. Second, we use Bayesian Optimization to quickly sift
through a large number of candidate images to find the canonical 'unrotated'
image. Third, we use this method to achieve robustness to large angles in an
image retrieval scenario. Our method is task-agnostic, and can be used as a
pre-processing step in any computer vision system.Comment: Accepted at Indian Conference on Computer Vision, Graphics and Image
Processing (ICVGIP) 201
Convolutional Neural Networks over Tree Structures for Programming Language Processing
Programming language processing (similar to natural language processing) is a
hot research topic in the field of software engineering; it has also aroused
growing interest in the artificial intelligence community. However, different
from a natural language sentence, a program contains rich, explicit, and
complicated structural information. Hence, traditional NLP models may be
inappropriate for programs. In this paper, we propose a novel tree-based
convolutional neural network (TBCNN) for programming language processing, in
which a convolution kernel is designed over programs' abstract syntax trees to
capture structural information. TBCNN is a generic architecture for programming
language processing; our experiments show its effectiveness in two different
program analysis tasks: classifying programs according to functionality, and
detecting code snippets of certain patterns. TBCNN outperforms baseline
methods, including several neural models for NLP.Comment: Accepted at AAAI-1
Dynamic Steerable Blocks in Deep Residual Networks
Filters in convolutional networks are typically parameterized in a pixel
basis, that does not take prior knowledge about the visual world into account.
We investigate the generalized notion of frames designed with image properties
in mind, as alternatives to this parametrization. We show that frame-based
ResNets and Densenets can improve performance on Cifar-10+ consistently, while
having additional pleasant properties like steerability. By exploiting these
transformation properties explicitly, we arrive at dynamic steerable blocks.
They are an extension of residual blocks, that are able to seamlessly transform
filters under pre-defined transformations, conditioned on the input at training
and inference time. Dynamic steerable blocks learn the degree of invariance
from data and locally adapt filters, allowing them to apply a different
geometrical variant of the same filter to each location of the feature map.
When evaluated on the Berkeley Segmentation contour detection dataset, our
approach outperforms all competing approaches that do not utilize pre-training.
Our results highlight the benefits of image-based regularization to deep
networks
Context-Dependent Diffusion Network for Visual Relationship Detection
Visual relationship detection can bridge the gap between computer vision and
natural language for scene understanding of images. Different from pure object
recognition tasks, the relation triplets of subject-predicate-object lie on an
extreme diversity space, such as \textit{person-behind-person} and
\textit{car-behind-building}, while suffering from the problem of combinatorial
explosion. In this paper, we propose a context-dependent diffusion network
(CDDN) framework to deal with visual relationship detection. To capture the
interactions of different object instances, two types of graphs, word semantic
graph and visual scene graph, are constructed to encode global context
interdependency. The semantic graph is built through language priors to model
semantic correlations across objects, whilst the visual scene graph defines the
connections of scene objects so as to utilize the surrounding scene
information. For the graph-structured data, we design a diffusion network to
adaptively aggregate information from contexts, which can effectively learn
latent representations of visual relationships and well cater to visual
relationship detection in view of its isomorphic invariance to graphs.
Experiments on two widely-used datasets demonstrate that our proposed method is
more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18
- …