115 research outputs found
HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation
Neural radiance fields (NeRF) have garnered significant attention, with
recent works such as Instant-NGP accelerating NeRF training and evaluation
through a combination of hashgrid-based positional encoding and neural
networks. However, effectively leveraging the spatial sparsity of 3D scenes
remains a challenge. To cull away unnecessary regions of the feature grid,
existing solutions rely on prior knowledge of object shape or periodically
estimate object shape during training by repeated model evaluations, which are
costly and wasteful.
To address this issue, we propose HollowNeRF, a novel compression solution
for hashgrid-based NeRF which automatically sparsifies the feature grid during
the training phase. Instead of directly compressing dense features, HollowNeRF
trains a coarse 3D saliency mask that guides efficient feature pruning, and
employs an alternating direction method of multipliers (ADMM) pruner to
sparsify the 3D saliency mask during training. By exploiting the sparsity in
the 3D scene to redistribute hash collisions, HollowNeRF improves rendering
quality while using a fraction of the parameters of comparable state-of-the-art
solutions, leading to a better cost-accuracy trade-off. Our method delivers
comparable rendering quality to Instant-NGP, while utilizing just 31% of the
parameters. In addition, our solution can achieve a PSNR accuracy gain of up to
1dB using only 56% of the parameters.Comment: Accepted to ICCV 202
Ada-QPacknet -- adaptive pruning with bit width reduction as an efficient continual learning method without forgetting
Continual Learning (CL) is a process in which there is still huge gap between
human and deep learning model efficiency. Recently, many CL algorithms were
designed. Most of them have many problems with learning in dynamic and complex
environments. In this work new architecture based approach Ada-QPacknet is
described. It incorporates the pruning for extracting the sub-network for each
task. The crucial aspect in architecture based CL methods is theirs capacity.
In presented method the size of the model is reduced by efficient linear and
nonlinear quantisation approach. The method reduces the bit-width of the
weights format. The presented results shows that hybrid 8 and 4-bit
quantisation achieves similar accuracy as floating-point sub-network on a
well-know CL scenarios. To our knowledge it is the first CL strategy which
incorporates both compression techniques pruning and quantisation for
generating task sub-networks. The presented algorithm was tested on well-known
episode combinations and compared with most popular algorithms. Results show
that proposed approach outperforms most of the CL strategies in task and class
incremental scenarios.Comment: Paper accepted at ECAI 202
Novel neural architectures & algorithms for efficient inference
In the last decade, the machine learning universe embraced deep neural networks (DNNs) wholeheartedly with the advent of neural architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, etc. These models have empowered many applications, such as ChatGPT, Imagen, etc., and have achieved state-of-the-art (SOTA) performance on many vision, speech, and language modeling tasks. However, SOTA performance comes with various issues, such as large model size, compute-intensive training, increased inference latency, higher working memory, etc. This thesis aims at improving the resource efficiency of neural architectures, i.e., significantly reducing the computational, storage, and energy consumption of a DNN without any significant loss in performance.
Towards this goal, we explore novel neural architectures as well as training algorithms that allow low-capacity models to achieve near SOTA performance. We divide this thesis into two dimensions: \textit{Efficient Low Complexity Models}, and \textit{Input Hardness Adaptive Models}.
Along the first dimension, i.e., \textit{Efficient Low Complexity Models}, we improve DNN performance by addressing instabilities in the existing architectures and training methods. We propose novel neural architectures inspired by ordinary differential equations (ODEs) to reinforce input signals and attend to salient feature regions. In addition, we show that carefully designed training schemes improve the performance of existing neural networks. We divide this exploration into two parts:
\textsc{(a) Efficient Low Complexity RNNs.} We improve RNN resource efficiency by addressing poor gradients, noise amplifications, and BPTT training issues. First, we improve RNNs by solving ODEs that eliminate vanishing and exploding gradients during the training. To do so, we present Incremental Recurrent Neural Networks (iRNNs) that keep track of increments in the equilibrium surface. Next, we propose Time Adaptive RNNs that mitigate the noise propagation issue in RNNs by modulating the time constants in the ODE-based transition function. We empirically demonstrate the superiority of ODE-based neural architectures over existing RNNs. Finally, we propose Forward Propagation Through Time (FPTT) algorithm for training RNNs. We show that FPTT yields significant gains compared to the more conventional Backward Propagation Through Time (BPTT) scheme.
\textsc{(b) Efficient Low Complexity CNNs.} Next, we improve CNN architectures by reducing their resource usage. They require greater depth to generate high-level features, resulting in computationally expensive models. We design a novel residual block, the Global layer, that constrains the input and output features by approximately solving partial differential equations (PDEs). It yields better receptive fields than traditional convolutional blocks and thus results in shallower networks. Further, we reduce the model footprint by enforcing a novel inductive bias that formulates the output of a residual block as a spatial interpolation between high-compute anchor pixels and low-compute cheaper pixels. This results in spatially interpolated convolutional blocks (SI-CNNs) that have better compute and performance trade-offs. Finally, we propose an algorithm that enforces various distributional constraints during training in order to achieve better generalization. We refer to this scheme as distributionally constrained learning (DCL).
In the second dimension, i.e., \textit{Input Hardness Adaptive Models}, we introduce the notion of the hardness of any input relative to any architecture. In the first dimension, a neural network allocates the same resources, such as compute, storage, and working memory, for all the inputs. It inherently assumes that all examples are equally hard for a model. In this dimension, we challenge this assumption using input hardness as our reasoning that some inputs are relatively easy for a network to predict compared to others. Input hardness enables us to create selective classifiers wherein a low-capacity network handles simple inputs while abstaining from a prediction on the complex inputs. Next, we create hybrid models that route the hard inputs from the low-capacity abstaining network to a high-capacity expert model. We design various architectures that adhere to this hybrid inference style. Further, input hardness enables us to selectively distill the knowledge of a high-capacity model into a low-capacity model by cleverly discarding hard inputs during the distillation procedure.
Finally, we conclude this thesis by sketching out various interesting future research directions that emerge as an extension of different ideas explored in this work
Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training
Binarization of neural networks is a dominant paradigm in neural networks
compression. The pioneering work BinaryConnect uses Straight Through Estimator
(STE) to mimic the gradients of the sign function, but it also causes the
crucial inconsistency problem. Most of the previous methods design different
estimators instead of STE to mitigate it. However, they ignore the fact that
when reducing the estimating error, the gradient stability will decrease
concomitantly. These highly divergent gradients will harm the model training
and increase the risk of gradient vanishing and gradient exploding. To fully
take the gradient stability into consideration, we present a new perspective to
the BNNs training, regarding it as the equilibrium between the estimating error
and the gradient stability. In this view, we firstly design two indicators to
quantitatively demonstrate the equilibrium phenomenon. In addition, in order to
balance the estimating error and the gradient stability well, we revise the
original straight through estimator and propose a power function based
estimator, Rectified Straight Through Estimator (ReSTE for short). Comparing to
other estimators, ReSTE is rational and capable of flexibly balancing the
estimating error with the gradient stability. Extensive experiments on CIFAR-10
and ImageNet datasets show that ReSTE has excellent performance and surpasses
the state-of-the-art methods without any auxiliary modules or losses.Comment: 10 pages, 6 figures. Accepted in ICCV 202
OTOV2: Automatic, Generic, User-Friendly
The existing model compression methods via structured pruning typically
require complicated multi-stage procedures. Each individual stage necessitates
numerous engineering efforts and domain-knowledge from the end-users which
prevent their wider applications onto broader scenarios. We propose the second
generation of Only-Train-Once (OTOv2), which first automatically trains and
compresses a general DNN only once from scratch to produce a more compact model
with competitive performance without fine-tuning. OTOv2 is automatic and
pluggable into various deep learning applications, and requires almost minimal
engineering efforts from the users. Methodologically, OTOv2 proposes two major
improvements: (i) Autonomy: automatically exploits the dependency of general
DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and
constructs the compressed model; and (ii) Dual Half-Space Projected Gradient
(DHSPG): a novel optimizer to more reliably solve structured-sparsity problems.
Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety
of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and
StackedUnets, the majority of which cannot be handled by other methods without
extensive handcrafting efforts. Together with benchmark datasets including
CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is
validated by performing competitively or even better than the
state-of-the-arts. The source code is available at
https://github.com/tianyic/only_train_once.Comment: Published on ICLR 2023. Remark here that a few images of dependency
graphs can not be included in arXiv due to exceeding size limi
Compact optimized deep learning model for edge: a review
Most real-time computer vision applications, such as pedestrian detection, augmented reality, and virtual reality, heavily rely on convolutional neural networks (CNN) for real-time decision support. In addition, edge intelligence is becoming necessary for low-latency real-time applications to process the data at the source device. Therefore, processing massive amounts of data impact memory footprint, prediction time, and energy consumption, essential performance metrics in machine learning based internet of things (IoT) edge clusters. However, deploying deeper, dense, and hefty weighted CNN models on resource-constraint embedded systems and limited edge computing resources, such as memory, and battery constraints, poses significant challenges in developing the compact optimized model. Reducing the energy consumption in edge IoT networks is possible by reducing the computation and data transmission between IoT devices and gateway devices. Hence there is a high demand for making energy-efficient deep learning models for deploying on edge devices. Furthermore, recent studies show that smaller compressed models achieve significant performance compared to larger deep-learning models. This review article focuses on state-of-the-art techniques of edge intelligence, and we propose a new research framework for designing a compact optimized deep learning (DL) model deployment on edge devices
Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks
Spiking Neural Networks (SNNs) have gained increasing attention as
energy-efficient neural networks owing to their binary and asynchronous
computation. However, their non-linear activation, that is
Leaky-Integrate-and-Fire (LIF) neuron, requires additional memory to store a
membrane voltage to capture the temporal dynamics of spikes. Although the
required memory cost for LIF neurons significantly increases as the input
dimension goes larger, a technique to reduce memory for LIF neurons has not
been explored so far. To address this, we propose a simple and effective
solution, EfficientLIF-Net, which shares the LIF neurons across different
layers and channels. Our EfficientLIF-Net achieves comparable accuracy with the
standard SNNs while bringing up to ~4.3X forward memory efficiency and ~21.9X
backward memory efficiency for LIF neurons. We conduct experiments on various
datasets including CIFAR10, CIFAR100, TinyImageNet, ImageNet-100, and
N-Caltech101. Furthermore, we show that our approach also offers advantages on
Human Activity Recognition (HAR) datasets, which heavily rely on temporal
information
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off
Over-parameterization of deep neural networks (DNNs) has shown high
prediction accuracy for many applications. Although effective, the large number
of parameters hinders its popularity on resource-limited devices and has an
outsize environmental impact. Sparse training (using a fixed number of nonzero
weights in each iteration) could significantly mitigate the training costs by
reducing the model size. However, existing sparse training methods mainly use
either random-based or greedy-based drop-and-grow strategies, resulting in
local minimal and low accuracy. In this work, we consider the dynamic sparse
training as a sparse connectivity search problem and design an exploitation and
exploration acquisition function to escape from local optima and saddle points.
We further design an acquisition function and provide the theoretical
guarantees for the proposed method and clarify its convergence property.
Experimental results show that sparse models (up to 98\% sparsity) obtained by
our proposed method outperform the SOTA sparse training methods on a wide
variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10,
ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models.
On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy
improvement compared to SOTA sparse training methods
Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach
Regularizing Deep Neural Networks (DNNs) is essential for improving
generalizability and preventing overfitting. Fixed penalty methods, though
common, lack adaptability and suffer from hyperparameter sensitivity. In this
paper, we propose a novel approach to DNN regularization by framing the
training process as a constrained optimization problem. Where the data fidelity
term is the minimization objective and the regularization terms serve as
constraints. Then, we employ the Stochastic Augmented Lagrangian (SAL) method
to achieve a more flexible and efficient regularization mechanism. Our approach
extends beyond black-box regularization, demonstrating significant improvements
in white-box models, where weights are often subject to hard constraints to
ensure interpretability. Experimental results on image-based classification on
MNIST, CIFAR10, and CIFAR100 datasets validate the effectiveness of our
approach. SAL consistently achieves higher Accuracy while also achieving better
constraint satisfaction, thus showcasing its potential for optimizing DNNs
under constrained settings
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks
The large computing and memory cost of deep neural networks (DNNs) often
precludes their use in resource-constrained devices. Quantizing the parameters
and operations to lower bit-precision offers substantial memory and energy
savings for neural network inference, facilitating the use of DNNs on edge
computing platforms. Recent efforts at quantizing DNNs have employed a range of
techniques encompassing progressive quantization, step-size adaptation, and
gradient scaling. This paper proposes a new quantization approach for mixed
precision convolutional neural networks (CNNs) targeting edge-computing. Our
method establishes a new pareto frontier in model accuracy and memory footprint
demonstrating a range of quantized models, delivering best-in-class accuracy
below 4.3 MB of weights (wgts.) and activations (acts.). Our main contributions
are: (i) hardware-aware heterogeneous differentiable quantization with
tensor-sliced learned precision, (ii) targeted gradient modification for wgts.
and acts. to mitigate quantization errors, and (iii) a multi-phase learning
schedule to address instability in learning arising from updates to the learned
quantizer and model parameters. We demonstrate the effectiveness of our
techniques on the ImageNet dataset across a range of models including
EfficientNet-Lite0 (e.g., 4.14MB of wgts. and acts. at 67.66% accuracy) and
MobileNetV2 (e.g., 3.51MB wgts. and acts. at 65.39% accuracy)
- …