20,180 research outputs found
A Genetic Programming Approach to Designing Convolutional Neural Network Architectures
The convolutional neural network (CNN), which is one of the deep learning
models, has seen much success in a variety of computer vision tasks. However,
designing CNN architectures still requires expert knowledge and a lot of trial
and error. In this paper, we attempt to automatically construct CNN
architectures for an image classification task based on Cartesian genetic
programming (CGP). In our method, we adopt highly functional modules, such as
convolutional blocks and tensor concatenation, as the node functions in CGP.
The CNN structure and connectivity represented by the CGP encoding method are
optimized to maximize the validation accuracy. To evaluate the proposed method,
we constructed a CNN architecture for the image classification task with the
CIFAR-10 dataset. The experimental result shows that the proposed method can be
used to automatically find the competitive CNN architecture compared with
state-of-the-art models.Comment: This is the revised version of the GECCO 2017 paper. The code of our
method is available at https://github.com/sg-nm/cgp-cn
Assessing hyper parameter optimization and speedup for convolutional neural networks
The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures
A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification
Convolutional auto-encoders have shown their remarkable performance in
stacking to deep convolutional neural networks for classifying image data
during past several years. However, they are unable to construct the
state-of-the-art convolutional neural networks due to their intrinsic
architectures. In this regard, we propose a flexible convolutional auto-encoder
by eliminating the constraints on the numbers of convolutional layers and
pooling layers from the traditional convolutional auto-encoder. We also design
an architecture discovery method by using particle swarm optimization, which is
capable of automatically searching for the optimal architectures of the
proposed flexible convolutional auto-encoder with much less computational
resource and without any manual intervention. We use the designed architecture
optimization algorithm to test the proposed flexible convolutional auto-encoder
through utilizing one graphic processing unit card on four extensively used
image classification datasets. Experimental results show that our work in this
paper significantly outperform the peer competitors including the
state-of-the-art algorithm.Comment: Accepted by IEEE Transactions on Neural Networks and Learning
Systems, 201
Finding Competitive Network Architectures Within a Day Using UCT
The design of neural network architectures for a new data set is a laborious
task which requires human deep learning expertise. In order to make deep
learning available for a broader audience, automated methods for finding a
neural network architecture are vital. Recently proposed methods can already
achieve human expert level performances. However, these methods have run times
of months or even years of GPU computing time, ignoring hardware constraints as
faced by many researchers and companies. We propose the use of Monte Carlo
planning in combination with two different UCT (upper confidence bound applied
to trees) derivations to search for network architectures. We adapt the UCT
algorithm to the needs of network architecture search by proposing two ways of
sharing information between different branches of the search tree. In an
empirical study we are able to demonstrate that this method is able to find
competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day.
Extending the search time to five GPU days, we are able to outperform human
architectures and our competitors which consider the same types of layers
- …