7 research outputs found
FireNet: Real-time Segmentation of Fire Perimeter from Aerial Video
In this paper, we share our approach to real-time segmentation of fire
perimeter from aerial full-motion infrared video. We start by describing the
problem from a humanitarian aid and disaster response perspective.
Specifically, we explain the importance of the problem, how it is currently
resolved, and how our machine learning approach improves it. To test our models
we annotate a large-scale dataset of 400,000 frames with guidance from domain
experts. Finally, we share our approach currently deployed in production with
inference speed of 20 frames per second and an accuracy of 92 (F1 Score).Comment: Published at NeurIPS 2019; Workshop on Artificial Intelligence for
Humanitarian Assistance and Disaster Response(AI+HADR 2019
Rethinking the Value of Network Pruning
Network pruning is widely used for reducing the heavy inference cost of deep
models in low-resource settings. A typical pruning algorithm is a three-stage
pipeline, i.e., training (a large model), pruning and fine-tuning. During
pruning, according to a certain criterion, redundant weights are pruned and
important weights are kept to best preserve the accuracy. In this work, we make
several surprising observations which contradict common beliefs. For all
state-of-the-art structured pruning algorithms we examined, fine-tuning a
pruned model only gives comparable or worse performance than training that
model with randomly initialized weights. For pruning algorithms which assume a
predefined target network architecture, one can get rid of the full pipeline
and directly train the target network from scratch. Our observations are
consistent for multiple network architectures, datasets, and tasks, which imply
that: 1) training a large, over-parameterized model is often not necessary to
obtain an efficient final model, 2) learned "important" weights of the large
model are typically not useful for the small pruned model, 3) the pruned
architecture itself, rather than a set of inherited "important" weights, is
more crucial to the efficiency in the final model, which suggests that in some
cases pruning can be useful as an architecture search paradigm. Our results
suggest the need for more careful baseline evaluations in future research on
structured pruning methods. We also compare with the "Lottery Ticket
Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate,
the "winning ticket" initialization as used in Frankle & Carbin (2019) does not
bring improvement over random initialization.Comment: ICLR 2019. Significant revisions from the previous versio
Filter Pruning using Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks
Since the convolutional neural networks are often trained with redundant
parameters, it is possible to reduce redundant kernels or filters to obtain a
compact network without dropping the classification accuracy. In this paper, we
propose a filter pruning method using the hierarchical group sparse
regularization. It is shown in our previous work that the hierarchical group
sparse regularization is effective in obtaining sparse networks in which
filters connected to unnecessary channels are automatically close to zero.
After training the convolutional neural network with the hierarchical group
sparse regularization, the unnecessary filters are selected based on the
increase of the classification loss of the randomly selected training samples
to obtain a compact network. It is shown that the proposed method can reduce
more than 50% parameters of ResNet for CIFAR-10 with only 0.3% decrease in the
accuracy of test samples. Also, 34% parameters of ResNet are reduced for
TinyImageNet-200 with higher accuracy than the baseline network.Comment: Accepted to ICPR 202
Filter Grafting for Deep Neural Networks: Reason, Method, and Cultivation
Filter is the key component in modern convolutional neural networks (CNNs).
However, since CNNs are usually over-parameterized, a pre-trained network
always contain some invalid (unimportant) filters. These filters have
relatively small norm and contribute little to the output
(\textbf{Reason}). While filter pruning removes these invalid filters for
efficiency consideration, we tend to reactivate them to improve the
representation capability of CNNs. In this paper, we introduce filter grafting
(\textbf{Method}) to achieve this goal. The activation is processed by grafting
external information (weights) into invalid filters. To better perform the
grafting, we develop a novel criterion to measure the information of filters
and an adaptive weighting strategy to balance the grafted information among
networks. After the grafting operation, the network has fewer invalid filters
compared with its initial state, enpowering the model with more representation
capacity. Meanwhile, since grafting is operated reciprocally on all networks
involved, we find that grafting may lose the information of valid filters when
improving invalid filters. To gain a universal improvement on both valid and
invalid filters, we compensate grafting with distillation
(\textbf{Cultivation}) to overcome the drawback of grafting . Extensive
experiments are performed on the classification and recognition tasks to show
the superiority of our method. Code is available at
\textcolor{black}{\emph{https://github.com/fxmeng/filter-grafting}}.Comment: arXiv admin note: substantial text overlap with arXiv:2001.0586
Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators
This paper studies structured sparse training of CNNs with a gradual pruning
technique that leads to fixed, sparse weight matrices after a set number of
epochs. We simplify the structure of the enforced sparsity so that it reduces
overhead caused by regularization. The proposed training methodology Campfire
explores pruning at granularities within a convolutional kernel and filter.
We study various tradeoffs with respect to pruning duration, level of
sparsity, and learning rate configuration. We show that our method creates a
sparse version of ResNet-50 and ResNet-50 v1.5 on full ImageNet while remaining
within a negligible <1% margin of accuracy loss. To ensure that this type of
sparse training does not harm the robustness of the network, we also
demonstrate how the network behaves in the presence of adversarial attacks. Our
results show that with 70% target sparsity, over 75% top-1 accuracy is
achievable
Out-of-the-box channel pruned networks
In the last decade convolutional neural networks have become gargantuan.
Pre-trained models, when used as initializers are able to fine-tune ever larger
networks on small datasets. Consequently, not all the convolutional features
that these fine-tuned models detect are requisite for the end-task. Several
works of channel pruning have been proposed to prune away compute and memory
from models that were trained already. Typically, these involve policies that
decide which and how many channels to remove from each layer leading to
channel-wise and/or layer-wise pruning profiles, respectively. In this paper,
we conduct several baseline experiments and establish that profiles from random
channel-wise pruning policies are as good as metric-based ones. We also
establish that there may exist profiles from some layer-wise pruning policies
that are measurably better than common baselines. We then demonstrate that the
top layer-wise pruning profiles found using an exhaustive random search from
one datatset are also among the top profiles for other datasets. This implies
that we could identify out-of-the-box layer-wise pruning profiles using
benchmark datasets and use these directly for new datasets. Furthermore, we
develop a Reinforcement Learning (RL) policy-based search algorithm with a
direct objective of finding transferable layer-wise pruning profiles using many
models for the same architecture. We use a novel reward formulation that drives
this RL search towards an expected compression while maximizing accuracy. Our
results show that our transferred RL-based profiles are as good or better than
best profiles found on the original dataset via exhaustive search. We then
demonstrate that if we found the profiles using a mid-sized dataset such as
Cifar10/100, we are able to transfer them to even a large dataset such as
Imagenet.Comment: Under review at ECCV 202
An end-to-end approach for speeding up neural network inference
Important applications such as mobile computing require reducing the
computational costs of neural network inference. Ideally, applications would
specify their preferred tradeoff between accuracy and speed, and the network
would optimize this end-to-end, using classification error to remove parts of
the network \cite{lecun1990optimal,mozer1989skeletonization,BMVC2016_104}.
Increasing speed can be done either during training -- e.g., pruning filters
\cite{li2016pruning} -- or during inference -- e.g., conditionally executing a
subset of the layers \cite{aig}. We propose a single end-to-end framework that
can improve inference efficiency in both settings. We introduce a batch
activation loss and use Gumbel reparameterization to learn network structure
\cite{aig,jang2016categorical}. We train end-to-end against batch activation
loss combined with classification loss, and the same technique supports pruning
as well as conditional computation. We obtain promising experimental results
for ImageNet classification with ResNet \cite{he2016resnet} (45-52\% less
computation) and MobileNetV2 \cite{sandler2018mobilenetv2} (19-37\% less
computation).Comment: New version of the paper including filter-level gating and new math
that explains gate polarizatio