2,441 research outputs found
A Hybrid Differential Evolution Approach to Designing Deep Convolutional Neural Networks for Image Classification
Convolutional Neural Networks (CNNs) have demonstrated their superiority in
image classification, and evolutionary computation (EC) methods have recently
been surging to automatically design the architectures of CNNs to save the
tedious work of manually designing CNNs. In this paper, a new hybrid
differential evolution (DE) algorithm with a newly added crossover operator is
proposed to evolve the architectures of CNNs of any lengths, which is named
DECNN. There are three new ideas in the proposed DECNN method. Firstly, an
existing effective encoding scheme is refined to cater for variable-length CNN
architectures; Secondly, the new mutation and crossover operators are developed
for variable-length DE to optimise the hyperparameters of CNNs; Finally, the
new second crossover is introduced to evolve the depth of the CNN
architectures. The proposed algorithm is tested on six widely-used benchmark
datasets and the results are compared to 12 state-of-the-art methods, which
shows the proposed method is vigorously competitive to the state-of-the-art
algorithms. Furthermore, the proposed method is also compared with a method
using particle swarm optimisation with a similar encoding strategy named IPPSO,
and the proposed DECNN outperforms IPPSO in terms of the accuracy.Comment: Accepted by The Australasian Joint Conference on Artificial
Intelligence 201
-ARM: Network Sparsification via Stochastic Binary Optimization
We consider network sparsification as an -norm regularized binary
optimization problem, where each unit of a neural network (e.g., weight,
neuron, or channel, etc.) is attached with a stochastic binary gate, whose
parameters are jointly optimized with original network parameters. The
Augment-Reinforce-Merge (ARM), a recently proposed unbiased gradient estimator,
is investigated for this binary optimization problem. Compared to the hard
concrete gradient estimator from Louizos et al., ARM demonstrates superior
performance of pruning network architectures while retaining almost the same
accuracies of baseline methods. Similar to the hard concrete estimator, ARM
also enables conditional computation during model training but with improved
effectiveness due to the exact binary stochasticity. Thanks to the flexibility
of ARM, many smooth or non-smooth parametric functions, such as scaled sigmoid
or hard sigmoid, can be used to parameterize this binary optimization problem
and the unbiasness of the ARM estimator is retained, while the hard concrete
estimator has to rely on the hard sigmoid function to achieve conditional
computation and thus accelerated training. Extensive experiments on multiple
public datasets demonstrate state-of-the-art pruning rates with almost the same
accuracies of baseline methods. The resulting algorithm -ARM sparsifies
the Wide-ResNet models on CIFAR-10 and CIFAR-100 while the hard concrete
estimator cannot. The code is public available at
https://github.com/leo-yangli/l0-arm.Comment: Published as a conference paper at ECML 201
Compression of Deep Neural Networks on the Fly
Thanks to their state-of-the-art performance, deep neural networks are
increasingly used for object recognition. To achieve these results, they use
millions of parameters to be trained. However, when targeting embedded
applications the size of these models becomes problematic. As a consequence,
their usage on smartphones or other resource limited devices is prohibited. In
this paper we introduce a novel compression method for deep neural networks
that is performed during the learning phase. It consists in adding an extra
regularization term to the cost function of fully-connected layers. We combine
this method with Product Quantization (PQ) of the trained weights for higher
savings in storage consumption. We evaluate our method on two data sets (MNIST
and CIFAR10), on which we achieve significantly larger compression rates than
state-of-the-art methods
On the Equivalence Between Deep NADE and Generative Stochastic Networks
Neural Autoregressive Distribution Estimators (NADEs) have recently been
shown as successful alternatives for modeling high dimensional multimodal
distributions. One issue associated with NADEs is that they rely on a
particular order of factorization for . This issue has been
recently addressed by a variant of NADE called Orderless NADEs and its deeper
version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion
that stochastically maximizes with all possible orders of
factorizations. Unfortunately, ancestral sampling from deep NADE is very
expensive, corresponding to running through a neural net separately predicting
each of the visible variables given some others. This work makes a connection
between this criterion and the training criterion for Generative Stochastic
Networks (GSNs). It shows that training NADEs in this way also trains a GSN,
which defines a Markov chain associated with the NADE model. Based on this
connection, we show an alternative way to sample from a trained Orderless NADE
that allows to trade-off computing time and quality of the samples: a 3 to
10-fold speedup (taking into account the waste due to correlations between
consecutive samples of the chain) can be obtained without noticeably reducing
the quality of the samples. This is achieved using a novel sampling procedure
for GSNs called annealed GSN sampling, similar to tempering methods that
combines fast mixing (obtained thanks to steps at high noise levels) with
accurate samples (obtained thanks to steps at low noise levels).Comment: ECML/PKDD 201
Symmetry constrained machine learning
Symmetry, a central concept in understanding the laws of nature, has been
used for centuries in physics, mathematics, and chemistry, to help make
mathematical models tractable. Yet, despite its power, symmetry has not been
used extensively in machine learning, until rather recently. In this article we
show a general way to incorporate symmetries into machine learning models. We
demonstrate this with a detailed analysis on a rather simple real world machine
learning system - a neural network for classifying handwritten digits, lacking
bias terms for every neuron. We demonstrate that ignoring symmetries can have
dire over-fitting consequences, and that incorporating symmetry into the model
reduces over-fitting, while at the same time reducing complexity, ultimately
requiring less training data, and taking less time and resources to train
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media
When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of
statistical and non-semantic deep learning models
Input Fast-Forwarding for Better Deep Learning
This paper introduces a new architectural framework, known as input
fast-forwarding, that can enhance the performance of deep networks. The main
idea is to incorporate a parallel path that sends representations of input
values forward to deeper network layers. This scheme is substantially different
from "deep supervision" in which the loss layer is re-introduced to earlier
layers. The parallel path provided by fast-forwarding enhances the training
process in two ways. First, it enables the individual layers to combine
higher-level information (from the standard processing path) with lower-level
information (from the fast-forward path). Second, this new architecture reduces
the problem of vanishing gradients substantially because the fast-forwarding
path provides a shorter route for gradient backpropagation. In order to
evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet),
with 20 convolutional layers along with parallel fast-forward paths, has been
created and tested. The paper presents empirical results that demonstrate
improved learning capacity of FFNet due to fast-forwarding, as compared to
GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in
size, respectively. All of the source code and deep learning models described
in this paper will be made available to the entire research communityComment: Accepted in the 14th International Conference on Image Analysis and
Recognition (ICIAR) 2017, Montreal, Canad
Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction
Visual media are powerful means of expressing emotions and sentiments. The
constant generation of new content in social networks highlights the need of
automated visual sentiment analysis tools. While Convolutional Neural Networks
(CNNs) have established a new state-of-the-art in several vision problems,
their application to the task of sentiment analysis is mostly unexplored and
there are few studies regarding how to design CNNs for this purpose. In this
work, we study the suitability of fine-tuning a CNN for visual sentiment
prediction as well as explore performance boosting techniques within this deep
learning setting. Finally, we provide a deep-dive analysis into a benchmark,
state-of-the-art network architecture to gain insight about how to design
patterns for CNNs on the task of visual sentiment prediction.Comment: Preprint of the paper accepted at the 1st Workshop on Affect and
Sentiment in Multimedia (ASM), in ACM MultiMedia 2015. Brisbane, Australi
The evolution of signal form: Effects of learned versus inherited recognition
Organisms can learn by individual experience to recognize relevant stimuli
in the environment or they can genetically inherit this ability from their
parents. Here, we ask how these two modes of acquisition affect signal evolution, focusing in particular on the exaggeration and cost of signals. We argue first, that faster learning by individual receivers cannot be a driving force for the evolution of exaggerated and costly signals unless signal senders are related or the same receiver and sender meet repeatedly. We argue instead that biases in receivers’ recognition mechanisms can promote the evolution of costly exaggeration in signals. We provide support for this hypothesis by simulating coevolution between senders and receivers, using artificial neural networks as a model of receivers’ recognition mechanisms. We analyse the joint effects of receiver biases, signal cost and mode of acquisition, investigating the circumstances under which learned recognition gives rise to more exaggerated signals than inherited recognition. We conclude the paper by discussing the relevance of our results to a number of biological scenarios
SNE: Signed Network Embedding
Several network embedding models have been developed for unsigned networks.
However, these models based on skip-gram cannot be applied to signed networks
because they can only deal with one type of link. In this paper, we present our
signed network embedding model called SNE. Our SNE adopts the log-bilinear
model, uses node representations of all nodes along a given path, and further
incorporates two signed-type vectors to capture the positive or negative
relationship of each edge along the path. We conduct two experiments, node
classification and link prediction, on both directed and undirected signed
networks and compare with four baselines including a matrix factorization
method and three state-of-the-art unsigned network embedding models. The
experimental results demonstrate the effectiveness of our signed network
embedding.Comment: To appear in PAKDD 201
- …