9,805 research outputs found
Convolutional Networks with Adaptive Inference Graphs
Do convolutional networks really need a fixed feed-forward structure? What
if, after identifying the high-level concept of an image, a network could move
directly to a layer that can distinguish fine-grained differences? Currently, a
network would first need to execute sometimes hundreds of intermediate layers
that specialize in unrelated aspects. Ideally, the more a network already knows
about an image, the better it should be at deciding which layer to compute
next. In this work, we propose convolutional networks with adaptive inference
graphs (ConvNet-AIG) that adaptively define their network topology conditioned
on the input image. Following a high-level structure similar to residual
networks (ResNets), ConvNet-AIG decides for each input image on the fly which
layers are needed. In experiments on ImageNet we show that ConvNet-AIG learns
distinct inference graphs for different categories. Both ConvNet-AIG with 50
and 101 layers outperform their ResNet counterpart, while using 20% and 38%
less computations respectively. By grouping parameters into layers for related
classes and only executing relevant layers, ConvNet-AIG improves both
efficiency and overall classification quality. Lastly, we also study the effect
of adaptive inference graphs on the susceptibility towards adversarial
examples. We observe that ConvNet-AIG shows a higher robustness than ResNets,
complementing other known defense mechanisms.Comment: IJCV 201
Topology and Prediction Focused Research on Graph Convolutional Neural Networks
Important advances have been made using convolutional neural network (CNN)
approaches to solve complicated problems in areas that rely on grid structured
data such as image processing and object classification. Recently, research on
graph convolutional neural networks (GCNN) has increased dramatically as
researchers try to replicate the success of CNN for graph structured data.
Unfortunately, traditional CNN methods are not readily transferable to GCNN,
given the irregularity and geometric complexity of graphs. The emerging field
of GCNN is further complicated by research papers that differ greatly in their
scope, detail, and level of academic sophistication needed by the reader.
The present paper provides a review of some basic properties of GCNN. As a
guide to the interested reader, recent examples of GCNN research are then
grouped according to techniques that attempt to uncover the underlying topology
of the graph model and those that seek to generalize traditional CNN methods on
graph data to improve prediction of class membership. Discrete Signal
Processing on Graphs (DSPg) is used as a theoretical framework to better
understand some of the performance gains and limitations of these recent GCNN
approaches. A brief discussion of Topology Adaptive Graph Convolutional
Networks (TAGCN) is presented as an approach motivated by DSPg and future
research directions using this approach are briefly discussed
Graph Attribute Aggregation Network with Progressive Margin Folding
Graph convolutional neural networks (GCNNs) have been attracting increasing
research attention due to its great potential in inference over graph
structures. However, insufficient effort has been devoted to the aggregation
methods between different convolution graph layers. In this paper, we introduce
a graph attribute aggregation network (GAAN) architecture. Different from the
conventional pooling operations, a graph-transformation-based aggregation
strategy, progressive margin folding, PMF, is proposed for integrating graph
features. By distinguishing internal and margin elements, we provide an
approach for implementing the folding iteratively. And a mechanism is also
devised for preserving the local structures during progressively folding. In
addition, a hypergraph-based representation is introduced for transferring the
aggregated information between different layers. Our experiments applied to the
public molecule datasets demonstrate that the proposed GAAN outperforms the
existing GCNN models with significant effectiveness
Representation Learning on Visual-Symbolic Graphs for Video Understanding
Events in natural videos typically arise from spatio-temporal interactions
between actors and objects and involve multiple co-occurring activities and
object classes. To capture this rich visual and semantic context, we propose
using two graphs: (1) an attributed spatio-temporal visual graph whose nodes
correspond to actors and objects and whose edges encode different types of
interactions, and (2) a symbolic graph that models semantic relationships. We
further propose a graph neural network for refining the representations of
actors, objects and their interactions on the resulting hybrid graph. Our model
goes beyond current approaches that assume nodes and edges are of the same
type, operate on graphs with fixed edge weights and do not use a symbolic
graph. In particular, our framework: a) has specialized attention-based message
functions for different node and edge types; b) uses visual edge features; c)
integrates visual evidence with label relationships; and d) performs global
reasoning in the semantic space. Experiments on challenging video understanding
tasks, such as temporal action localization on the Charades dataset, show that
the proposed method leads to state-of-the-art performance.Comment: ECCV 202
Resolution Adaptive Networks for Efficient Inference
Adaptive inference is an effective mechanism to achieve a dynamic tradeoff
between accuracy and computational cost in deep networks. Existing works mainly
exploit architecture redundancy in network depth or width. In this paper, we
focus on spatial redundancy of input samples and propose a novel Resolution
Adaptive Network (RANet), which is inspired by the intuition that
low-resolution representations are sufficient for classifying "easy" inputs
containing large objects with prototypical features, while only some "hard"
samples need spatially detailed information. In RANet, the input images are
first routed to a lightweight sub-network that efficiently extracts
low-resolution representations, and those samples with high prediction
confidence will exit early from the network without being further processed.
Meanwhile, high-resolution paths in the network maintain the capability to
recognize the "hard" samples. Therefore, RANet can effectively reduce the
spatial redundancy involved in inferring high-resolution inputs. Empirically,
we demonstrate the effectiveness of the proposed RANet on the CIFAR-10,
CIFAR-100 and ImageNet datasets in both the anytime prediction setting and the
budgeted batch classification setting.Comment: CVPR 202
Exploring Visual Relationship for Image Captioning
It is always well believed that modeling relationships between objects would
be helpful for representing and eventually describing an image. Nevertheless,
there has not been evidence in support of the idea on image description
generation. In this paper, we introduce a new design to explore the connections
between objects for image captioning under the umbrella of attention-based
encoder-decoder framework. Specifically, we present Graph Convolutional
Networks plus Long Short-Term Memory (dubbed as GCN-LSTM) architecture that
novelly integrates both semantic and spatial object relationships into image
encoder. Technically, we build graphs over the detected objects in an image
based on their spatial and semantic connections. The representations of each
region proposed on objects are then refined by leveraging graph structure
through GCN. With the learnt region-level features, our GCN-LSTM capitalizes on
LSTM-based captioning framework with attention mechanism for sentence
generation. Extensive experiments are conducted on COCO image captioning
dataset, and superior results are reported when comparing to state-of-the-art
approaches. More remarkably, GCN-LSTM increases CIDEr-D performance from 120.1%
to 128.7% on COCO testing set.Comment: ECCV 201
Edge-labeling Graph Neural Network for Few-shot Learning
In this paper, we propose a novel edge-labeling graph neural network (EGNN),
which adapts a deep neural network on the edge-labeling graph, for few-shot
learning. The previous graph neural network (GNN) approaches in few-shot
learning have been based on the node-labeling framework, which implicitly
models the intra-cluster similarity and the inter-cluster dissimilarity. In
contrast, the proposed EGNN learns to predict the edge-labels rather than the
node-labels on the graph that enables the evolution of an explicit clustering
by iteratively updating the edge-labels with direct exploitation of both
intra-cluster similarity and the inter-cluster dissimilarity. It is also well
suited for performing on various numbers of classes without retraining, and can
be easily extended to perform a transductive inference. The parameters of the
EGNN are learned by episodic training with an edge-labeling loss to obtain a
well-generalizable model for unseen low-data problem. On both of the supervised
and semi-supervised few-shot image classification tasks with two benchmark
datasets, the proposed EGNN significantly improves the performances over the
existing GNNs.Comment: accepted to CVPR 201
Looking back to lower-level information in few-shot learning
Humans are capable of learning new concepts from small numbers of examples.
In contrast, supervised deep learning models usually lack the ability to
extract reliable predictive rules from limited data scenarios when attempting
to classify new examples. This challenging scenario is commonly known as
few-shot learning. Few-shot learning has garnered increased attention in recent
years due to its significance for many real-world problems. Recently, new
methods relying on meta-learning paradigms combined with graph-based
structures, which model the relationship between examples, have shown promising
results on a variety of few-shot classification tasks. However, existing work
on few-shot learning is only focused on the feature embeddings produced by the
last layer of the neural network. In this work, we propose the utilization of
lower-level, supporting information, namely the feature embeddings of the
hidden neural network layers, to improve classifier accuracy. Based on a
graph-based meta-learning framework, we develop a method called Looking-Back,
where such lower-level information is used to construct additional graphs for
label propagation in limited data settings. Our experiments on two popular
few-shot learning datasets, miniImageNet and tieredImageNet, show that our
method can utilize the lower-level information in the network to improve
state-of-the-art classification performance.Comment: 13 pages, 2 figures; fixed typographic errors and added journal re
Adaptive Hierarchical Down-Sampling for Point Cloud Classification
While several convolution-like operators have recently been proposed for
extracting features out of point clouds, down-sampling an unordered point cloud
in a deep neural network has not been rigorously studied. Existing methods
down-sample the points regardless of their importance for the output. As a
result, some important points in the point cloud may be removed, while less
valuable points may be passed to the next layers. In contrast, adaptive
down-sampling methods sample the points by taking into account the importance
of each point, which varies based on the application, task and training data.
In this paper, we propose a permutation-invariant learning-based adaptive
down-sampling layer, called Critical Points Layer (CPL), which reduces the
number of points in an unordered point cloud while retaining the important
points. Unlike most graph-based point cloud down-sampling methods that use
-NN search algorithm to find the neighbouring points, CPL is a global
down-sampling method, rendering it computationally very efficient. The proposed
layer can be used along with any graph-based point cloud convolution layer to
form a convolutional neural network, dubbed CP-Net in this paper. We introduce
a CP-Net for D object classification that achieves the best accuracy for the
ModelNet dataset among point cloud-based methods, which validates the
effectiveness of the CPL
Adaptive Neural Networks for Efficient Inference
We present an approach to adaptively utilize deep neural networks in order to
reduce the evaluation time on new examples without loss of accuracy. Rather
than attempting to redesign or approximate existing networks, we propose two
schemes that adaptively utilize networks. We first pose an adaptive network
evaluation scheme, where we learn a system to adaptively choose the components
of a deep network to be evaluated for each example. By allowing examples
correctly classified using early layers of the system to exit, we avoid the
computational time associated with full evaluation of the network. We extend
this to learn a network selection system that adaptively selects the network to
be evaluated for each example. We show that computational time can be
dramatically reduced by exploiting the fact that many examples can be correctly
classified using relatively efficient networks and that complex,
computationally costly networks are only necessary for a small fraction of
examples. We pose a global objective for learning an adaptive early exit or
network selection policy and solve it by reducing the policy learning problem
to a layer-by-layer weighted binary classification problem. Empirically, these
approaches yield dramatic reductions in computational cost, with up to a 2.8x
speedup on state-of-the-art networks from the ImageNet image recognition
challenge with minimal (<1%) loss of top5 accuracy
- …