15 research outputs found

    SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

    Full text link
    Recently, efficient Vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a Single-Head Vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using Mask-RCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively.Comment: CVPR 202

    AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks

    Full text link
    Existing fine-tuning methods use a single learning rate over all layers. In this paper, first, we discuss that trends of layer-wise weight variations by fine-tuning using a single learning rate do not match the well-known notion that lower-level layers extract general features and higher-level layers extract specific features. Based on our discussion, we propose an algorithm that improves fine-tuning performance and reduces network complexity through layer-wise pruning and auto-tuning of layer-wise learning rates. The proposed algorithm has verified the effectiveness by achieving state-of-the-art performance on the image retrieval benchmark datasets (CUB-200, Cars-196, Stanford online product, and Inshop). Code is available at https://github.com/youngminPIL/AutoLR.Comment: Accepted to AAAI 202

    Arbitrary-Scale Downscaling of Tidal Current Data Using Implicit Continuous Representation

    Full text link
    Numerical models have long been used to understand geoscientific phenomena, including tidal currents, crucial for renewable energy production and coastal engineering. However, their computational cost hinders generating data of varying resolutions. As an alternative, deep learning-based downscaling methods have gained traction due to their faster inference speeds. But most of them are limited to only inference fixed scale and overlook important characteristics of target geoscientific data. In this paper, we propose a novel downscaling framework for tidal current data, addressing its unique characteristics, which are dissimilar to images: heterogeneity and local dependency. Moreover, our framework can generate any arbitrary-scale output utilizing a continuous representation model. Our proposed framework demonstrates significantly improved flow velocity predictions by 93.21% (MSE) and 63.85% (MAE) compared to the Baseline model while achieving a remarkable 33.2% reduction in FLOPs

    Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification

    Full text link
    In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.Comment: Accepted to AAAI 201

    Differentially Private Normalizing Flows for Synthetic Tabular Data Generation

    No full text
    Normalizing flows have shown to be a promising approach to deep generative modeling due to their ability to exactly evaluate density --- other alternatives either implicitly model the density or use approximate surrogate density. In this work, we present a differentially private normalizing flow model for heterogeneous tabular data. Normalizing flows are in general not amenable to differentially private training because they require complex neural networks with larger depth (compared to other generative models) and use specialized architectures for which per-example gradient computation is difficult (or unknown). To reduce the parameter complexity, the proposed model introduces a conditional spline flow which simulates transformations at different stages depending on additional input and is shared among sub-flows. For privacy, we introduce two fine-grained gradient clipping strategies that provide a better signal-to-noise ratio and derive fast gradient clipping methods for layers with custom parameterization. Our empirical evaluations show that the proposed model preserves statistical properties of original dataset better than other baselines

    Rollback Ensemble With Multiple Local Minima in Fine-Tuning Deep Learning Networks

    No full text
    Image retrieval is a challenging problem that requires learning generalized features enough to identify untrained classes, even with very few classwise training samples. In this article, to obtain generalized features further in learning retrieval data sets, we propose a novel fine-tuning method of pretrained deep networks. In the retrieval task, we discovered a phenomenon in which the loss reduction in fine-tuning deep networks is stagnated, even while weights are largely updated. To escape from the stagnated state, we propose a new fine-tuning strategy to roll back some of the weights to the pretrained values. The rollback scheme is observed to drive the learning path to a gentle basin that provides more generalized features than a sharp basin. In addition, we propose a multihead ensemble structure to create synergy among multiple local minima obtained by our rollback scheme. Experimental results show that the proposed learning method significantly improves generalization performance, achieving state-of-the-art performance on the Inshop and SOP data sets.N

    T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors

    No full text
    Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction

    Safety and efficacy of mechanical thrombectomy with the Solitaire device in large artery occlusion

    No full text
    Background and Purpose: Intravenous tissue plasminogen activator (TPA) has limited efficacy in proximal large vessel occlusions. This study was to assess the safety and efficacy of mechanical thrombectomy with a retrievable Solitaire stent in acute large artery occlusions . Materials and Methods: This is a single center study enrolling patients treated with Solitaire-assisted thrombectomy between November 2010 and March 2011. Inclusion criteria were severe stroke of National Institutes of Health Stroke Scale (NIHSS) score ≥10, treatment initiation within 6 hours from onset, and an angiographically verified occlusion of proximal middle cerebral artery (MCA) or internal carotid artery (ICA). The primary outcome was recanalization defined as Thrombolysis in Cerebral Infarct (TICI) reperfusion grade 2b/3. Secondary outcomes were good functional outcome at 3 months (modified Rankin Scale [mRS] ≤2), early substantial neurological improvement (NIHSS score improvement ≥8 at 24 hours), and symptomatic hemorrhagic transformation (SHT). Results: Ten patients were consecutively enrolled: Age: 72.4 ΁ 5.7 years; female: 70%; baseline median NIHSS score: 19.5; and ICA occlusion in 50% and M1 portion of MCA occlusion in 50%. Six patients received intravenous TPA before intra-arterial treatment, and five patients were treated with adjuvant intra-arterial urokinase. Successful recanalization was achieved in 7 (70%) patients. Four (40%) patients had a good functional outcome at 3 months, and three (30%) patients had an early substantial neurological improvement. SHT occurred in two patients (20%), and 3-month mortality rate was 30%. There was no procedure-related complication. Conclusions: Mechanical thrombectomy with the Solitaire device can effectively recanalize proximal large vessel occlusions, and potentially improves clinical outcome
    corecore