49 research outputs found
Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
Deep neural networks (DNNs) have become a widely deployed model for numerous
machine learning applications. However, their fixed architecture, substantial
training cost, and significant model redundancy make it difficult to
efficiently update them to accommodate previously unseen data. To solve these
problems, we propose an incremental learning framework based on a
grow-and-prune neural network synthesis paradigm. When new data arrive, the
neural network first grows new connections based on the gradients to increase
the network capacity to accommodate new data. Then, the framework iteratively
prunes away connections based on the magnitude of weights to enhance network
compactness, and hence recover efficiency. Finally, the model rests at a
lightweight DNN that is both ready for inference and suitable for future
grow-and-prune updates. The proposed framework improves accuracy, shrinks
network size, and significantly reduces the additional training cost for
incoming data compared to conventional approaches, such as training from
scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural
network architectures derived for the MNIST dataset, the framework reduces
training cost by up to 64% (63%) and 67% (63%) compared to training from
scratch (network fine-tuning), respectively. For the ResNet-18 architecture
derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the
corresponding training cost reductions against training from scratch (network
fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models
contain fewer network parameters but achieve higher accuracy relative to
conventional baselines
Global Context Vision Transformers
We propose global context vision transformer (GC ViT), a novel architecture
that enhances parameter and compute utilization for computer vision tasks. The
core of the novel model are global context self-attention modules, joint with
standard local self-attention, to effectively yet efficiently model both long
and short-range spatial interactions, as an alternative to complex operations
such as an attention masks or local windows shifting. While the local
self-attention modules are responsible for modeling short-range information,
the global query tokens are shared across all global self-attention modules to
interact with local key and values. In addition, we address the lack of
inductive bias in ViTs and improve the modeling of inter-channel dependencies
by proposing a novel downsampler which leverages a parameter-efficient fused
inverted residual block. The proposed GC ViT achieves new state-of-the-art
performance across image classification, object detection and semantic
segmentation tasks. On ImageNet-1K dataset for classification, the tiny, small
and base variants of GC ViT with 28M, 51M and 90M parameters achieve 83.4%,
83.9% and 84.4% Top-1 accuracy, respectively, surpassing comparably-sized prior
art such as CNN-based ConvNeXt and ViT-based Swin Transformer. Pre-trained GC
ViT backbones in downstream tasks of object detection, instance segmentation,
and semantic segmentation on MS COCO and ADE20K datasets outperform prior work
consistently, sometimes by large margins. Code and pre-trained models are
available at https://github.com/NVlabs/GCViT.Comment: 15 pages, 8 figure
Defense Against Model Extraction Attacks on Recommender Systems
The robustness of recommender systems has become a prominent topic within the
research community. Numerous adversarial attacks have been proposed, but most
of them rely on extensive prior knowledge, such as all the white-box attacks or
most of the black-box attacks which assume that certain external knowledge is
available. Among these attacks, the model extraction attack stands out as a
promising and practical method, involving training a surrogate model by
repeatedly querying the target model. However, there is a significant gap in
the existing literature when it comes to defending against model extraction
attacks on recommender systems. In this paper, we introduce Gradient-based
Ranking Optimization (GRO), which is the first defense strategy designed to
counter such attacks. We formalize the defense as an optimization problem,
aiming to minimize the loss of the protected target model while maximizing the
loss of the attacker's surrogate model. Since top-k ranking lists are
non-differentiable, we transform them into swap matrices which are instead
differentiable. These swap matrices serve as input to a student model that
emulates the surrogate model's behavior. By back-propagating the loss of the
student model, we obtain gradients for the swap matrices. These gradients are
used to compute a swap loss, which maximizes the loss of the student model. We
conducted experiments on three benchmark datasets to evaluate the performance
of GRO, and the results demonstrate its superior effectiveness in defending
against model extraction attacks
DiabDeep: Pervasive Diabetes Diagnosis based on Wearable Medical Sensors and Efficient Neural Networks
Diabetes impacts the quality of life of millions of people. However, diabetes
diagnosis is still an arduous process, given that the disease develops and gets
treated outside the clinic. The emergence of wearable medical sensors (WMSs)
and machine learning points to a way forward to address this challenge. WMSs
enable a continuous mechanism to collect and analyze physiological signals.
However, disease diagnosis based on WMS data and its effective deployment on
resource-constrained edge devices remain challenging due to inefficient feature
extraction and vast computation cost. In this work, we propose a framework
called DiabDeep that combines efficient neural networks (called DiabNNs) with
WMSs for pervasive diabetes diagnosis. DiabDeep bypasses the feature extraction
stage and acts directly on WMS data. It enables both an (i) accurate inference
on the server, e.g., a desktop, and (ii) efficient inference on an edge device,
e.g., a smartphone, based on varying design goals and resource budgets. On the
server, we stack sparsely connected layers to deliver high accuracy. On the
edge, we use a hidden-layer long short-term memory based recurrent layer to cut
down on computation and storage. At the core of DiabDeep lies a grow-and-prune
training flow: it leverages gradient-based growth and magnitude-based pruning
algorithms to learn both weights and connections for DiabNNs. We demonstrate
the effectiveness of DiabDeep through analyzing data from 52 participants. For
server (edge) side inference, we achieve a 96.3% (95.3%) accuracy in
classifying diabetics against healthy individuals, and a 95.7% (94.6%) accuracy
in distinguishing among type-1/type-2 diabetic, and healthy individuals.
Against conventional baselines, DiabNNs achieve higher accuracy, while reducing
the model size (FLOPs) by up to 454.5x (8.9x). Therefore, the system can be
viewed as pervasive and efficient, yet very accurate
Fully Dynamic Inference with Deep Neural Networks
Modern deep neural networks are powerful and widely applicable models that
extract task-relevant information through multi-level abstraction. Their
cross-domain success, however, is often achieved at the expense of
computational cost, high memory bandwidth, and long inference latency, which
prevents their deployment in resource-constrained and time-sensitive scenarios,
such as edge-side inference and self-driving cars. While recently developed
methods for creating efficient deep neural networks are making their real-world
deployment more feasible by reducing model size, they do not fully exploit
input properties on a per-instance basis to maximize computational efficiency
and task accuracy. In particular, most existing methods typically use a
one-size-fits-all approach that identically processes all inputs. Motivated by
the fact that different images require different feature embeddings to be
accurately classified, we propose a fully dynamic paradigm that imparts deep
convolutional neural networks with hierarchical inference dynamics at the level
of layers and individual convolutional filters/channels. Two compact networks,
called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance
basis which layers or filters/channels are redundant and therefore should be
skipped. L-Net and C-Net also learn how to scale retained computation outputs
to maximize task accuracy. By integrating L-Net and C-Net into a joint design
framework, called LC-Net, we consistently outperform state-of-the-art dynamic
frameworks with respect to both efficiency and classification accuracy. On the
CIFAR-10 dataset, LC-Net results in up to 11.9 fewer floating-point
operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic
inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4
fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods
AdaViT: Adaptive Tokens for Efficient Vision Transformer
We introduce A-ViT, a method that adaptively adjusts the inference cost of
vision transformer (ViT) for images of different complexity. A-ViT achieves
this by automatically reducing the number of tokens in vision transformers that
are processed in the network as inference proceeds. We reformulate Adaptive
Computation Time (ACT) for this task, extending halting to discard redundant
spatial tokens. The appealing architectural properties of vision transformers
enables our adaptive token reduction mechanism to speed up inference without
modifying the network architecture or inference hardware. We demonstrate that
A-ViT requires no extra parameters or sub-network for halting, as we base the
learning of adaptive halting on the original network parameters. We further
introduce distributional prior regularization that stabilizes training compared
to prior ACT approaches. On the image classification task (ImageNet1K), we show
that our proposed A-ViT yields high efficacy in filtering informative spatial
features and cutting down on the overall compute. The proposed method improves
the throughput of DeiT-Tiny by 62% and DeiT-Small by 38% with only 0.3%
accuracy drop, outperforming prior art by a large margin. Project page at
https://a-vit.github.io/Comment: CVPR'22 oral acceptanc