949 research outputs found
EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching from Scratch
Designing the structure of neural networks is considered one of the most
challenging tasks in deep learning, especially when there is few prior
knowledge about the task domain. In this paper, we propose an
Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of
succession, extinction, mimicry, and gene duplication to search neural network
structure from scratch with poorly initialized simple network and few
constraints forced during the evolution, as we assume no prior knowledge about
the task domain. Specifically, we first use primary succession to rapidly
evolve a population of poorly initialized neural network structures into a more
diverse population, followed by a secondary succession stage for fine-grained
searching based on the networks from the primary succession. Extinction is
applied in both stages to reduce computational cost. Mimicry is employed during
the entire evolution process to help the inferior networks imitate the behavior
of a superior network and gene duplication is utilized to duplicate the learned
blocks of novel structures, both of which help to find better network
structures. Experimental results show that our proposed approach can achieve
similar or better performance compared to the existing genetic approaches with
dramatically reduced computation cost. For example, the network discovered by
our approach on CIFAR-100 dataset achieves 78.1% test accuracy under 120 GPU
hours, compared to 77.0% test accuracy in more than 65, 536 GPU hours in [35].Comment: CVPR 201
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern
computing applications. Accelerating their training is a major challenge and
techniques range from distributed algorithms to low-level circuit design. In
this survey, we describe the problem from a theoretical perspective, followed
by approaches for its parallelization. We present trends in DNN architectures
and the resulting implications on parallelization strategies. We then review
and model the different types of concurrency in DNNs: from the single operator,
through parallelism in network inference and training, to distributed deep
learning. We discuss asynchronous stochastic optimization, distributed system
architectures, communication schemes, and neural architecture search. Based on
those approaches, we extrapolate potential directions for parallelism in deep
learning
Multi-Objective Simulated Annealing for Hyper-Parameter Optimization in Convolutional Neural Networks
In this study, we model a CNN hyper-parameter optimization problem as a
bi-criteria optimization problem, where the first objective being the classification
accuracy and the second objective being the computational complexity which is
measured in terms of the number of floating point operations. For this bi-criteria
optimization problem, we develop a Multi-Objective Simulated Annealing (MOSA)
algorithm for obtaining high-quality solutions in terms of both objectives. CIFAR-10
is selected as the benchmark dataset, and the MOSA trade-off fronts obtained for
this dataset are compared to the fronts generated by a single-objective Simulated
Annealing (SA) algorithm with respect to several front evaluation metrics such
as generational distance, spacing and spread. The comparison results suggest that the
MOSA algorithm is able to search the objective space more effectively than the SA
method. For each of these methods, some front solutions are selected for longer
training in order to see their actual performance on the original test set. Again, the
results state that the MOSA performs better than the SA under multi-objective
setting. The performance of the MOSA configurations are also compared to other
search generated and human designed state-of-the-art architectures. It is shown that
the network configurations generated by the MOSA are not dominated by those
architectures, and the proposed method can be of great use when the computational
complexity is as important as the test accuracy
- …