1,546 research outputs found
A Comprehensive Survey on Graph Neural Networks
Deep learning has revolutionized many machine learning tasks in recent years,
ranging from image classification and video processing to speech recognition
and natural language understanding. The data in these tasks are typically
represented in the Euclidean space. However, there is an increasing number of
applications where data are generated from non-Euclidean domains and are
represented as graphs with complex relationships and interdependency between
objects. The complexity of graph data has imposed significant challenges on
existing machine learning algorithms. Recently, many studies on extending deep
learning approaches for graph data have emerged. In this survey, we provide a
comprehensive overview of graph neural networks (GNNs) in data mining and
machine learning fields. We propose a new taxonomy to divide the
state-of-the-art graph neural networks into four categories, namely recurrent
graph neural networks, convolutional graph neural networks, graph autoencoders,
and spatial-temporal graph neural networks. We further discuss the applications
of graph neural networks across various domains and summarize the open source
codes, benchmark data sets, and model evaluation of graph neural networks.
Finally, we propose potential research directions in this rapidly growing
field.Comment: Minor revision (updated tables and references
Harnessing Deep Neural Networks with Logic Rules
Combining deep neural networks with structured logic rules is desirable to
harness flexibility and reduce uninterpretability of the neural models. We
propose a general framework capable of enhancing various types of neural
networks (e.g., CNNs and RNNs) with declarative first-order logic rules.
Specifically, we develop an iterative distillation method that transfers the
structured information of logic rules into the weights of neural networks. We
deploy the framework on a CNN for sentiment analysis, and an RNN for named
entity recognition. With a few highly intuitive rules, we obtain substantial
improvements and achieve state-of-the-art or comparable results to previous
best-performing systems.Comment: Fix typos in appendix. ACL 201
Machine Learning Methods for Data Association in Multi-Object Tracking
Data association is a key step within the multi-object tracking pipeline that
is notoriously challenging due to its combinatorial nature. A popular and
general way to formulate data association is as the NP-hard multidimensional
assignment problem (MDAP). Over the last few years, data-driven approaches to
assignment have become increasingly prevalent as these techniques have started
to mature. We focus this survey solely on learning algorithms for the
assignment step of multi-object tracking, and we attempt to unify various
methods by highlighting their connections to linear assignment as well as to
the MDAP. First, we review probabilistic and end-to-end optimization approaches
to data association, followed by methods that learn association affinities from
data. We then compare the performance of the methods presented in this survey,
and conclude by discussing future research directions.Comment: Accepted for publication in ACM Computing Survey
Semistochastic Quadratic Bound Methods
Partition functions arise in a variety of settings, including conditional
random fields, logistic regression, and latent gaussian models. In this paper,
we consider semistochastic quadratic bound (SQB) methods for maximum likelihood
inference based on partition function optimization. Batch methods based on the
quadratic bound were recently proposed for this class of problems, and
performed favorably in comparison to state-of-the-art techniques.
Semistochastic methods fall in between batch algorithms, which use all the
data, and stochastic gradient type methods, which use small random selections
at each iteration. We build semistochastic quadratic bound-based methods, and
prove both global convergence (to a stationary point) under very weak
assumptions, and linear convergence rate under stronger assumptions on the
objective. To make the proposed methods faster and more stable, we consider
inexact subproblem minimization and batch-size selection schemes. The efficacy
of SQB methods is demonstrated via comparison with several state-of-the-art
techniques on commonly used datasets.Comment: 11 pages, 1 figur
AdaNet: Adaptive Structural Learning of Artificial Neural Networks
We present new algorithms for adaptively learning artificial neural networks.
Our algorithms (AdaNet) adaptively learn both the structure of the network and
its weights. They are based on a solid theoretical analysis, including
data-dependent generalization guarantees that we prove and discuss in detail.
We report the results of large-scale experiments with one of our algorithms on
several binary classification tasks extracted from the CIFAR-10 dataset. The
results demonstrate that our algorithm can automatically learn network
structures with very competitive performance accuracies when compared with
those achieved for neural networks found by standard approaches
Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision
A number of results have recently demonstrated the benefits of incorporating
various constraints when training deep architectures in vision and machine
learning. The advantages range from guarantees for statistical generalization
to better accuracy to compression. But support for general constraints within
widely used libraries remains scarce and their broader deployment within many
applications that can benefit from them remains under-explored. Part of the
reason is that Stochastic gradient descent (SGD), the workhorse for training
deep neural networks, does not natively deal with constraints with global scope
very well. In this paper, we revisit a classical first order scheme from
numerical optimization, Conditional Gradients (CG), that has, thus far had
limited applicability in training deep models. We show via rigorous analysis
how various constraints can be naturally handled by modifications of this
algorithm. We provide convergence guarantees and show a suite of immediate
benefits that are possible -- from training ResNets with fewer layers but
better accuracy simply by substituting in our version of CG to faster training
of GANs with 50% fewer epochs in image inpainting applications to provably
better generalization guarantees using efficiently implementable forms of
recently proposed regularizers
End-to-end representation learning for Correlation Filter based tracking
The Correlation Filter is an algorithm that trains a linear template to
discriminate between images and their translations. It is well suited to object
tracking because its formulation in the Fourier domain provides a fast
solution, enabling the detector to be re-trained once per frame. Previous works
that use the Correlation Filter, however, have adopted features that were
either manually designed or trained for a different task. This work is the
first to overcome this limitation by interpreting the Correlation Filter
learner, which has a closed-form solution, as a differentiable layer in a deep
neural network. This enables learning deep features that are tightly coupled to
the Correlation Filter. Experiments illustrate that our method has the
important practical benefit of allowing lightweight architectures to achieve
state-of-the-art performance at high framerates.Comment: To appear at CVPR 201
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Length bias in Encoder Decoder Models and a Case for Global Conditioning
Encoder-decoder networks are popular for modeling sequences probabilistically
in many applications. These models use the power of the Long Short-Term Memory
(LSTM) architecture to capture the full dependence among variables, unlike
earlier models like CRFs that typically assumed conditional independence among
non-adjacent variables. However in practice encoder-decoder models exhibit a
bias towards short sequences that surprisingly gets worse with increasing beam
size.
In this paper we show that such phenomenon is due to a discrepancy between
the full sequence margin and the per-element margin enforced by the locally
conditioned training objective of a encoder-decoder model. The discrepancy more
adversely impacts long sequences, explaining the bias towards predicting short
sequences.
For the case where the predicted sequences come from a closed set, we show
that a globally conditioned model alleviates the above problems of
encoder-decoder models. From a practical point of view, our proposed model also
eliminates the need for a beam-search during inference, which reduces to an
efficient dot-product based search in a vector-space
Frank-Wolfe Network: An Interpretable Deep Structure for Non-Sparse Coding
The problem of -norm constrained coding is to convert signal into code
that lies inside an -ball and most faithfully reconstructs the signal.
Previous works under the name of sparse coding considered the cases of
and norms. The cases with values, i.e. non-sparse coding studied in
this paper, remain a difficulty. We propose an interpretable deep structure
namely Frank-Wolfe Network (F-W Net), whose architecture is inspired by
unrolling and truncating the Frank-Wolfe algorithm for solving an -norm
constrained problem with . We show that the Frank-Wolfe solver for the
-norm constraint leads to a novel closed-form nonlinear unit, which is
parameterized by and termed . The unit links the
conventional pooling, activation, and normalization operations, making F-W Net
distinct from existing deep networks either heuristically designed or converted
from projected gradient descent algorithms. We further show that the
hyper-parameter can be made learnable instead of pre-chosen in F-W Net,
which gracefully solves the non-sparse coding problem even with unknown . We
evaluate the performance of F-W Net on an extensive range of simulations as
well as the task of handwritten digit recognition, where F-W Net exhibits
strong learning capability. We then propose a convolutional version of F-W Net,
and apply the convolutional F-W Net into image denoising and super-resolution
tasks, where F-W Net all demonstrates impressive effectiveness, flexibility,
and robustness.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology. Code and pretrained models: https://github.com/sunke123/FW-Ne
- …