668 research outputs found
Combined Group and Exclusive Sparsity for Deep Neural Networks
Department of Computer Science and EngineeringThe number of parameters in a deep neural network is usually very large, which helps with its learning capacity but also hinders its scalability and practicality due to memory/time inefficiency and overfitting. To resolve this issue, we propose a sparsity regularization method that exploits both positive and negative correlations among the features to enforce the network to be sparse, and at the same time remove any redundancies among the features to fully utilize the capacity of the network. Specifically, we propose to use an exclusive sparsity regularization based on (1,2)-norm, which promotes competition for features between different weights, thus enforcing them to fit to disjoint sets of features. We further combine the exclusive sparsity with the group sparsity based on (2,1)-norm, to promote both sharing and competition for features in training of a deep neural network. We validate our method on multiple public datasets, and the results show that our method can obtain more compact and efficient networks while also improving the performance over the base networks with full weights, as opposed to existing sparsity regularizations that often obtain efficiency at the expense of prediction accuracy.ope
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields
We apply stochastic average gradient (SAG) algorithms for training
conditional random fields (CRFs). We describe a practical implementation that
uses structure in the CRF gradient to reduce the memory requirement of this
linearly-convergent stochastic gradient method, propose a non-uniform sampling
scheme that substantially improves practical performance, and analyze the rate
of convergence of the SAGA variant under non-uniform sampling. Our experimental
results reveal that our method often significantly outperforms existing methods
in terms of the training objective, and performs as well or better than
optimally-tuned stochastic gradient methods in terms of test error.Comment: AI/Stats 2015, 24 page
Semantically Consistent Regularization for Zero-Shot Recognition
The role of semantics in zero-shot learning is considered. The effectiveness
of previous approaches is analyzed according to the form of supervision
provided. While some learn semantics independently, others only supervise the
semantic subspace explained by training classes. Thus, the former is able to
constrain the whole space but lacks the ability to model semantic correlations.
The latter addresses this issue but leaves part of the semantic space
unsupervised. This complementarity is exploited in a new convolutional neural
network (CNN) framework, which proposes the use of semantics as constraints for
recognition.Although a CNN trained for classification has no transfer ability,
this can be encouraged by learning an hidden semantic layer together with a
semantic code for classification. Two forms of semantic constraints are then
introduced. The first is a loss-based regularizer that introduces a
generalization constraint on each semantic predictor. The second is a codeword
regularizer that favors semantic-to-class mappings consistent with prior
semantic knowledge while allowing these to be learned from data. Significant
improvements over the state-of-the-art are achieved on several datasets.Comment: Accepted to CVPR 201
Graph Clustering with Graph Neural Networks
Graph Neural Networks (GNNs) have achieved state-of-the-art results on many
graph analysis tasks such as node classification and link prediction. However,
important unsupervised problems on graphs, such as graph clustering, have
proved more resistant to advances in GNNs. In this paper, we study unsupervised
training of GNN pooling in terms of their clustering capabilities.
We start by drawing a connection between graph clustering and graph pooling:
intuitively, a good graph clustering is what one would expect from a GNN
pooling layer. Counterintuitively, we show that this is not true for
state-of-the-art pooling methods, such as MinCut pooling. To address these
deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised
pooling method inspired by the modularity measure of clustering quality, and
show how it tackles recovery of the challenging clustering structure of
real-world graphs. In order to clarify the regimes where existing methods fail,
we carefully design a set of experiments on synthetic data which show that DMoN
is able to jointly leverage the signal from the graph structure and node
attributes. Similarly, on real-world data, we show that DMoN produces high
quality clusters which correlate strongly with ground truth labels, achieving
state-of-the-art results
- …