257 research outputs found
Frequency Dropout: Feature-Level Regularization via Randomized Filtering
Deep convolutional neural networks have shown remarkable performance on
various computer vision tasks, and yet, they are susceptible to picking up
spurious correlations from the training signal. So called `shortcuts' can occur
during learning, for example, when there are specific frequencies present in
the image data that correlate with the output predictions. Both high and low
frequencies can be characteristic of the underlying noise distribution caused
by the image acquisition rather than in relation to the task-relevant
information about the image content. Models that learn features related to this
characteristic noise will not generalize well to new data.
In this work, we propose a simple yet effective training strategy, Frequency
Dropout, to prevent convolutional neural networks from learning
frequency-specific imaging features. We employ randomized filtering of feature
maps during training which acts as a feature-level regularization. In this
study, we consider common image processing filters such as Gaussian smoothing,
Laplacian of Gaussian, and Gabor filtering. Our training strategy is
model-agnostic and can be used for any computer vision task. We demonstrate the
effectiveness of Frequency Dropout on a range of popular architectures and
multiple tasks including image classification, domain adaptation, and semantic
segmentation using both computer vision and medical imaging datasets. Our
results suggest that the proposed approach does not only improve predictive
accuracy but also improves robustness against domain shift.Comment: 15 page
HCM: Hardware-Aware Complexity Metric for Neural Network Architectures
Convolutional Neural Networks (CNNs) have become common in many fields
including computer vision, speech recognition, and natural language processing.
Although CNN hardware accelerators are already included as part of many SoC
architectures, the task of achieving high accuracy on resource-restricted
devices is still considered challenging, mainly due to the vast number of
design parameters that need to be balanced to achieve an efficient solution.
Quantization techniques, when applied to the network parameters, lead to a
reduction of power and area and may also change the ratio between communication
and computation. As a result, some algorithmic solutions may suffer from lack
of memory bandwidth or computational resources and fail to achieve the expected
performance due to hardware constraints. Thus, the system designer and the
micro-architect need to understand at early development stages the impact of
their high-level decisions (e.g., the architecture of the CNN and the amount of
bits used to represent its parameters) on the final product (e.g., the expected
power saving, area, and accuracy). Unfortunately, existing tools fall short of
supporting such decisions.
This paper introduces a hardware-aware complexity metric that aims to assist
the system designer of the neural network architectures, through the entire
project lifetime (especially at its early stages) by predicting the impact of
architectural and micro-architectural decisions on the final product. We
demonstrate how the proposed metric can help evaluate different design
alternatives of neural network models on resource-restricted devices such as
real-time embedded systems, and to avoid making design mistakes at early
stages
Evaluation of Parameter-Scaling for Efficient Deep Learning on Small Satellites
Parameter-scaling techniques change the number of parameters in a machine-learning model in an effort to make the network more amenable to different device types or accuracy requirements. This research compares the performance of two such techniques. NeuralScale is a neural architecture search method which claims to generate deep neural networks for devices that are resource-constrained. It shrinks a network to a target number of parameters by adjusting the width of layers independently to achieve a higher accuracy than previous methods. The novel NeuralScale algorithm is compared to the baseline uniform scaling of MobileNet-style models, where the width of each layer in the model is scaled uniformly across the network. Measurements of the latency and runtime memory required for inference were gathered on the NVIDIA Jetson TX2 and Jetson AGX Xavier embedded GPUs using NVIDIA TensorRT. Measurements were also gathered on the Raspberry Pi 4 embedded CPU featuring ARM Cortex-A72 cores using ONNX Runtime. VGG-11, MobileNetV2, Pre-Activation ResNet-18, and ResNet-50 were all scaled to 0.25×, 0.50×, 0.75×, and 1.00× the original number of parameters. On embedded GPUs, this research finds that NeuralScale models do offer higher accuracy, but they run slower and consume much more runtime memory during inference than their equivalent uniform-scaling models. On average, NeuralScale is 40% as efficient as uniform scaling in terms of accuracy per megabyte of runtime memory, and NeuralScale uses 2.7× the runtime memory per parameter as uniform scaling. On the embedded CPU, NeuralScale is slightly more efficient than uniform scaling in terms of accuracy per megabyte of memory, using essentially the same amount of memory per parameter. However, there is on average an over 2.5× increase in the latency for inference. Importantly, parameter count does not guarantee performance in terms of runtime-memory usage between the scaling methods on embedded GPUs, while latency grows significantly on embedded CPUs
- …