15 research outputs found
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Recently, efficient Vision Transformers have shown great performance with low
latency on resource-constrained devices. Conventionally, they use 4x4 patch
embeddings and a 4-stage structure at the macro level, while utilizing
sophisticated attention with multi-head configuration at the micro level. This
paper aims to address computational redundancy at all design levels in a
memory-efficient manner. We discover that using larger-stride patchify stem not
only reduces memory access costs but also achieves competitive performance by
leveraging token representations with reduced spatial redundancy from the early
stages. Furthermore, our preliminary analyses suggest that attention layers in
the early stages can be substituted with convolutions, and several attention
heads in the latter stages are computationally redundant. To handle this, we
introduce a single-head attention module that inherently prevents head
redundancy and simultaneously boosts accuracy by parallelly combining global
and local information. Building upon our solutions, we introduce SHViT, a
Single-Head Vision Transformer that obtains the state-of-the-art speed-accuracy
tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x
faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device,
respectively, while being 1.3% more accurate. For object detection and instance
segmentation on MS COCO using Mask-RCNN head, our model achieves performance
comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone
latency on GPU and mobile device, respectively.Comment: CVPR 202
AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks
Existing fine-tuning methods use a single learning rate over all layers. In
this paper, first, we discuss that trends of layer-wise weight variations by
fine-tuning using a single learning rate do not match the well-known notion
that lower-level layers extract general features and higher-level layers
extract specific features. Based on our discussion, we propose an algorithm
that improves fine-tuning performance and reduces network complexity through
layer-wise pruning and auto-tuning of layer-wise learning rates. The proposed
algorithm has verified the effectiveness by achieving state-of-the-art
performance on the image retrieval benchmark datasets (CUB-200, Cars-196,
Stanford online product, and Inshop). Code is available at
https://github.com/youngminPIL/AutoLR.Comment: Accepted to AAAI 202
Arbitrary-Scale Downscaling of Tidal Current Data Using Implicit Continuous Representation
Numerical models have long been used to understand geoscientific phenomena,
including tidal currents, crucial for renewable energy production and coastal
engineering. However, their computational cost hinders generating data of
varying resolutions. As an alternative, deep learning-based downscaling methods
have gained traction due to their faster inference speeds. But most of them are
limited to only inference fixed scale and overlook important characteristics of
target geoscientific data. In this paper, we propose a novel downscaling
framework for tidal current data, addressing its unique characteristics, which
are dissimilar to images: heterogeneity and local dependency. Moreover, our
framework can generate any arbitrary-scale output utilizing a continuous
representation model. Our proposed framework demonstrates significantly
improved flow velocity predictions by 93.21% (MSE) and 63.85% (MAE) compared to
the Baseline model while achieving a remarkable 33.2% reduction in FLOPs
Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification
In person re-identification (ReID) task, because of its shortage of trainable
dataset, it is common to utilize fine-tuning method using a classification
network pre-trained on a large dataset. However, it is relatively difficult to
sufficiently fine-tune the low-level layers of the network due to the gradient
vanishing problem. In this work, we propose a novel fine-tuning strategy that
allows low-level layers to be sufficiently trained by rolling back the weights
of high-level layers to their initial pre-trained weights. Our strategy
alleviates the problem of gradient vanishing in low-level layers and robustly
trains the low-level layers to fit the ReID dataset, thereby increasing the
performance of ReID tasks. The improved performance of the proposed strategy is
validated via several experiments. Furthermore, without any add-ons such as
pose estimation or segmentation, our strategy exhibits state-of-the-art
performance using only vanilla deep convolutional neural network architecture.Comment: Accepted to AAAI 201
Differentially Private Normalizing Flows for Synthetic Tabular Data Generation
Normalizing flows have shown to be a promising approach to deep generative modeling due to their ability to exactly evaluate density --- other alternatives either implicitly model the density or use approximate surrogate density. In this work, we present a differentially private normalizing flow model for heterogeneous tabular data. Normalizing flows are in general not amenable to differentially private training because they require complex neural networks with larger depth (compared to other generative models) and use specialized architectures for which per-example gradient computation is difficult (or unknown). To reduce the parameter complexity, the proposed model introduces a conditional spline flow which simulates transformations at different stages depending on additional input and is shared among sub-flows. For privacy, we introduce two fine-grained gradient clipping strategies that provide a better signal-to-noise ratio and derive fast gradient clipping methods for layers with custom parameterization. Our empirical evaluations show that the proposed model preserves statistical properties of original dataset better than other baselines
Rollback Ensemble With Multiple Local Minima in Fine-Tuning Deep Learning Networks
Image retrieval is a challenging problem that requires learning generalized features enough to identify untrained classes, even with very few classwise training samples. In this article, to obtain generalized features further in learning retrieval data sets, we propose a novel fine-tuning method of pretrained deep networks. In the retrieval task, we discovered a phenomenon in which the loss reduction in fine-tuning deep networks is stagnated, even while weights are largely updated. To escape from the stagnated state, we propose a new fine-tuning strategy to roll back some of the weights to the pretrained values. The rollback scheme is observed to drive the learning path to a gentle basin that provides more generalized features than a sharp basin. In addition, we propose a multihead ensemble structure to create synergy among multiple local minima obtained by our rollback scheme. Experimental results show that the proposed learning method significantly improves generalization performance, achieving state-of-the-art performance on the Inshop and SOP data sets.N
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors
Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction
Safety and efficacy of mechanical thrombectomy with the Solitaire device in large artery occlusion
Background and Purpose: Intravenous tissue plasminogen activator (TPA) has limited efficacy in proximal large vessel occlusions. This study was to assess the safety and efficacy of mechanical thrombectomy with a retrievable Solitaire stent in acute large artery occlusions . Materials and Methods: This is a single center study enrolling patients treated with Solitaire-assisted thrombectomy between November 2010 and March 2011. Inclusion criteria were severe stroke of National Institutes of Health Stroke Scale (NIHSS) score ≥10, treatment initiation within 6 hours from onset, and an angiographically verified occlusion of proximal middle cerebral artery (MCA) or internal carotid artery (ICA). The primary outcome was recanalization defined as Thrombolysis in Cerebral Infarct (TICI) reperfusion grade 2b/3. Secondary outcomes were good functional outcome at 3 months (modified Rankin Scale [mRS] ≤2), early substantial neurological improvement (NIHSS score improvement ≥8 at 24 hours), and symptomatic hemorrhagic transformation (SHT). Results: Ten patients were consecutively enrolled: Age: 72.4 5.7 years; female: 70%; baseline median NIHSS score: 19.5; and ICA occlusion in 50% and M1 portion of MCA occlusion in 50%. Six patients received intravenous TPA before intra-arterial treatment, and five patients were treated with adjuvant intra-arterial urokinase. Successful recanalization was achieved in 7 (70%) patients. Four (40%) patients had a good functional outcome at 3 months, and three (30%) patients had an early substantial neurological improvement. SHT occurred in two patients (20%), and 3-month mortality rate was 30%. There was no procedure-related complication. Conclusions: Mechanical thrombectomy with the Solitaire device can effectively recanalize proximal large vessel occlusions, and potentially improves clinical outcome