18 research outputs found
Transfer Learning to Learn with Multitask Neural Model Search
Deep learning models require extensive architecture design exploration and
hyperparameter optimization to perform well on a given task. The exploration of
the model design space is often made by a human expert, and optimized using a
combination of grid search and search heuristics over a large space of possible
choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach
that has been proposed to automate architecture design. NAS has been
successfully applied to generate Neural Networks that rival the best
human-designed architectures. However, NAS requires sampling, constructing, and
training hundreds to thousands of models to achieve well-performing
architectures. This procedure needs to be executed from scratch for each new
task. The application of NAS to a wide set of tasks currently lacks a way to
transfer generalizable knowledge across tasks. In this paper, we present the
Multitask Neural Model Search (MNMS) controller. Our goal is to learn a
generalizable framework that can condition model construction on successful
model searches for previously seen tasks, thus significantly speeding up the
search for new tasks. We demonstrate that MNMS can conduct an automated
architecture search for multiple tasks simultaneously while still learning
well-performing, specialized models for each task. We then show that
pre-trained MNMS controllers can transfer learning to new tasks. By leveraging
knowledge from previous searches, we find that pre-trained MNMS models start
from a better location in the search space and reduce search time on unseen
tasks, while still discovering models that outperform published human-designed
models
Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure
Model-based compression is an effective, facilitating, and expanded model of
neural network models with limited computing and low power. However,
conventional models of compression techniques utilize crafted features [2,3,12]
and explore specialized areas for exploration and design of large spaces in
terms of size, speed, and accuracy, which usually have returns Less and time is
up. This paper will effectively analyze deep auto compression (ADC) and
reinforcement learning strength in an effective sample and space design, and
improve the compression quality of the model. The results of compression of the
advanced model are obtained without any human effort and in a completely
automated way. With a 4- fold reduction in FLOP, the accuracy of 2.8% is higher
than the manual compression model for VGG-16 in ImageNet
Searching for Activation Functions
The choice of activation functions in deep networks has a significant effect
on the training dynamics and task performance. Currently, the most successful
and widely-used activation function is the Rectified Linear Unit (ReLU).
Although various hand-designed alternatives to ReLU have been proposed, none
have managed to replace it due to inconsistent gains. In this work, we propose
to leverage automatic search techniques to discover new activation functions.
Using a combination of exhaustive and reinforcement learning-based search, we
discover multiple novel activation functions. We verify the effectiveness of
the searches by conducting an empirical evaluation with the best discovered
activation function. Our experiments show that the best discovered activation
function, , which we name Swish, tends
to work better than ReLU on deeper models across a number of challenging
datasets. For example, simply replacing ReLUs with Swish units improves top-1
classification accuracy on ImageNet by 0.9\% for Mobile NASNet-A and 0.6\% for
Inception-ResNet-v2. The simplicity of Swish and its similarity to ReLU make it
easy for practitioners to replace ReLUs with Swish units in any neural network.Comment: Updated version of "Swish: a Self-Gated Activation Function
Neural Architecture Search Over a Graph Search Space
Neural Architecture Search (NAS) enabled the discovery of state-of-the-art
architectures in many domains. However, the success of NAS depends on the
definition of the search space. Current search spaces are defined as a static
sequence of decisions and a set of available actions for each decision. Each
possible sequence of actions defines an architecture. We propose a more
expressive class of search space: directed graphs. In our formalism, each
decision is a vertex and each action is an edge. This allows us to model
iterative and branching architecture design decisions. We demonstrate in
simulation, and on image classification experiments, basic iterative and
branching search structures, and show that the graph representation improves
sample efficiency
Neural Architecture Search in Embedding Space
The neural architecture search (NAS) algorithm with reinforcement learning
can be a powerful and novel framework for the automatic discovering process of
neural architectures. However, its application is restricted by noncontinuous
and high-dimensional search spaces, which result in difficulty in optimization.
To resolve these problems, we proposed NAS in embedding space (NASES), which is
a novel framework. Unlike other NAS with reinforcement learning approaches that
search over a discrete and high-dimensional architecture space, this approach
enables reinforcement learning to search in an embedding space by using
architecture encoders and decoders. The current experiment demonstrated that
the performance of the final architecture network using the NASES procedure is
comparable with that of other popular NAS approaches for the image
classification task on CIFAR-10. The results of the experiment were efficient
and indicated that NASES was highly efficient to discover final architecture
only in 3.5 GPU hours. The beneficial-performance and effectiveness of NASES
was impressive when the architecture-embedding searching and weight
initialization were applied.Comment: 11 page
Depth Self-Optimized Learning Toward Data Science
We propose a two-stage model called Depth Self-Optimized Learning (DSOL),
which aims to realize ANN depth self-configuration, self-optimization as well
as ANN training without manual intervention. In the first stage of DSOL, it
will configure ANN of specific depth according to a specific dataset. In the
second stage, DSOL will continuously optimize ANN based on Reinforcement
Learning (RL). Finally, the optimal depth is returned to the first stage of
DSOL for training, so that DSOL can configure the appropriate ANN depth and
perform more reasonable optimization when processing similar datasets again. In
the experiment, we ran DSOL on the Iris and Boston housing datasets, and the
results showed that DSOL performed well. We have uploaded the experiment
records and code to our Github
NASIB: Neural Architecture Search withIn Budget
Neural Architecture Search (NAS) represents a class of methods to generate
the optimal neural network architecture and typically iterate over candidate
architectures till convergence over some particular metric like validation
loss. They are constrained by the available computation resources, especially
in enterprise environments. In this paper, we propose a new approach for NAS,
called NASIB, which adapts and attunes to the computation resources (budget)
available by varying the exploration vs. exploitation trade-off. We reduce the
expert bias by searching over an augmented search space induced by
Superkernels. The proposed method can provide the architecture search useful
for different computation resources and different domains beyond image
classification of natural images where we lack bespoke architecture motifs and
domain expertise. We show, on CIFAR10, that itis possible to search over a
space that comprises of 12x more candidate operations than the traditional
prior art in just 1.5 GPU days, while reaching close to state of the art
accuracy. While our method searches over an exponentially larger search space,
it could lead to novel architectures that require lesser domain expertise,
compared to the majority of the existing methods
Graph Pruning for Model Compression
Previous AutoML pruning works utilized individual layer features to
automatically prune filters. We analyze the correlation for two layers from
different blocks which have a short-cut structure. It is found that, in one
block, the deeper layer has many redundant filters which can be represented by
filters in the former layer so that it is necessary to take information from
other layers into consideration in pruning. In this paper, a graph pruning
approach is proposed, which views any deep model as a topology graph. Graph
PruningNet based on the graph convolution network is designed to automatically
extract neighboring information for each node. To extract features from various
topologies, Graph PruningNet is connected with Pruned Network by an individual
fully connection layer for each node and jointly trained on a training dataset
from scratch. Thus, we can obtain reasonable weights for any size of
sub-network. We then search the best configuration of the Pruned Network by
reinforcement learning. Different from previous work, we take the node features
from well-trained Graph PruningNet, instead of the hand-craft features, as the
states in reinforcement learning. Compared with other AutoML pruning works, our
method has achieved the state-of-the-art under same conditions on
ImageNet-2012. The code will be released on GitHub
Rethinking the Number of Channels for the Convolutional Neural Network
Latest algorithms for automatic neural architecture search perform remarkable
but few of them can effectively design the number of channels for convolutional
neural networks and consume less computational efforts. In this paper, we
propose a method for efficient automatic architecture search which is special
to the widths of networks instead of the connections of neural architecture.
Our method, functionally incremental search based on function-preserving, will
explore the number of channels rapidly while controlling the number of
parameters of the target network. On CIFAR-10 and CIFAR-100 classification, our
method using minimal computational resources (0.4~1.3 GPU-days) can discover
more efficient rules of the widths of networks to improve the accuracy by about
0.5% on CIFAR-10 and a~2.33% on CIFAR-100 with fewer number of parameters. In
particular, our method is suitable for exploring the number of channels of
almost any convolutional neural network rapidly
AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Model compression is a critical technique to efficiently deploy neural
network models on mobile devices which have limited computation resources and
tight power budgets. Conventional model compression techniques rely on
hand-crafted heuristics and rule-based policies that require domain experts to
explore the large design space trading off among model size, speed, and
accuracy, which is usually sub-optimal and time-consuming. In this paper, we
propose AutoML for Model Compression (AMC) which leverage reinforcement
learning to provide the model compression policy. This learning-based
compression policy outperforms conventional rule-based compression policy by
having higher compression ratio, better preserving the accuracy and freeing
human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than
the handcrafted model compression policy for VGG-16 on ImageNet. We applied
this automated, push-the-button compression pipeline to MobileNet and achieved
1.81x speedup of measured inference latency on an Android phone and 1.43x
speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy