83 research outputs found

    ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

    Full text link
    Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. But both attention and multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense multiplications, resulting in costly training and inference. To this end, we propose to reparameterize the pre-trained ViT with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed ShiftAddViT\textbf{ShiftAddViT}, which aims for end-to-end inference speedups on GPUs without the need of training from scratch. Specifically, all MatMuls\texttt{MatMuls} among queries, keys, and values are reparameterized by additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized by shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on (quadratic or linear) attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. In principle, the faster experts run, the larger amount of input tokens are assigned. Extensive experiments consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to \textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202

    NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants

    Full text link
    Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions

    NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation

    Full text link
    Boosting the task accuracy of tiny neural networks (TNNs) has become a fundamental challenge for enabling the deployments of TNNs on edge devices which are constrained by strict limitations in terms of memory, computation, bandwidth, and power supply. To this end, we propose a framework called NetDistiller to boost the achievable accuracy of TNNs by treating them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN. Specifically, the target TNN model is jointly trained with the weight-sharing teacher model via (1) gradient surgery to tackle the gradient conflicts between them and (2) uncertainty-aware distillation to mitigate the overfitting of the teacher model. Extensive experiments across diverse tasks validate NetDistiller's effectiveness in boosting TNNs' achievable accuracy over state-of-the-art methods. Our code is available at https://github.com/GATECH-EIC/NetDistiller

    Experimental study on the mechanical controlling factors of fracture plugging strength for lost circulation control in shale gas reservoir

    Get PDF
    The geological conditions of shale reservoir present several unique challenges. These include the extensive development of multi-scale fractures, frequent losses during horizontal drilling, low success rates in plugging, and a tendency for the fracture plugging zone to experience repeated failures. Extensive analysis suggests that the weakening of the mechanical properties of shale fracture surfaces is the primary factor responsible for reducing the bearing capacity of the fracture plugging zone. To assess the influence of oil-based environments on the degradation of mechanical properties in shale fracture surfaces, rigorous mechanical property tests were conducted on shale samples subsequent to their exposure to various substances, including white oil, lye, and the filtrate of oil-based drilling fluid. The experimental results demonstrate that the average values of the elastic modulus and indwelling hardness of dry shale are 24.30 GPa and 0.64 GPa, respectively. Upon immersion in white oil, these values decrease to 22.42 GPa and 0.63 GPa, respectively. Additionally, the depth loss rates of dry shale and white oil-soaked shale are determined to be 57.12% and 61.96%, respectively, indicating an increased degree of fracturing on the shale surface. White oil, lye, and the filtrate of oil-based drilling fluid have demonstrated their capacity to reduce the friction coefficient of the shale surface. The average friction coefficients measured for white oil, lye, and oil-based drilling fluid are 0.80, 0.72, and 0.76, respectively, reflecting their individual weakening effects. Furthermore, it should be noted that the contact mode between the plugging materials and the fracture surface can also lead to a reduction in the friction coefficient between them. To enhance the bearing capacity of the plugging zone, a series of plugging experiments were conducted utilizing high-strength materials, high-friction materials, and nanomaterials. The selection of these materials was based on the understanding of the weakened mechanical properties of the fracture surface. The experimental results demonstrate that the reduced mechanical properties of the fracture surface can diminish the pressure-bearing capacity of the plugging zone. However, the implementation of high-strength materials, high-friction materials, and nanomaterials effectively enhances the pressure-bearing capacity of the plugging zone. The research findings offer valuable insights and guidance towards improving the sealing pressure capacity of shale fractures and effectively increasing the success rate of leakage control measures during shale drilling and completion. © 2023 The Author

    Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design

    Full text link
    Novel view synthesis is an essential functionality for enabling immersive experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for which generalizable Neural Radiance Fields (NeRFs) have gained increasing popularity thanks to their cross-scene generalization capability. Despite their promise, the real-device deployment of generalizable NeRFs is bottlenecked by their prohibitive complexity due to the required massive memory accesses to acquire scene features, causing their ray marching process to be memory-bounded. To this end, we propose Gen-NeRF, an algorithm-hardware co-design framework dedicated to generalizable NeRF acceleration, which for the first time enables real-time generalizable NeRFs. On the algorithm side, Gen-NeRF integrates a coarse-then-focus sampling strategy, leveraging the fact that different regions of a 3D scene contribute differently to the rendered pixel, to enable sparse yet effective sampling. On the hardware side, Gen-NeRF highlights an accelerator micro-architecture to maximize the data reuse opportunities among different rays by making use of their epipolar geometric relationship. Furthermore, our Gen-NeRF accelerator features a customized dataflow to enhance data locality during point-to-hardware mapping and an optimized scene feature storage strategy to minimize memory bank conflicts. Extensive experiments validate the effectiveness of our proposed Gen-NeRF framework in enabling real-time and generalizable novel view synthesis.Comment: Accepted by ISCA 202

    Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

    Full text link
    Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling-ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear angular attention during ViT inference. Our Castling-ViT leverages angular kernels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. Extensive experiments and ablation studies on three tasks consistently validate the effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on COCO detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based attentions.Comment: CVPR 202

    Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction

    Full text link
    Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable for immersive Augmented and Virtual Reality (AR/VR) applications, but achieving instant (i.e., < 5 seconds) on-device NeRF training remains a challenge. In this work, we first identify the inefficiency bottleneck: the need to interpolate NeRF embeddings up to 200,000 times from a 3D embedding grid during each training iteration. To alleviate this, we propose Instant-3D, an algorithm-hardware co-design acceleration framework that achieves instant on-device NeRF training. Our algorithm decomposes the embedding grid representation in terms of color and density, enabling computational redundancy to be squeezed out by adopting different (1) grid sizes and (2) update frequencies for the color and density branches. Our hardware accelerator further reduces the dominant memory accesses for embedding grid interpolation by (1) mapping multiple nearby points' memory read requests into one during the feed-forward process, (2) merging embedding grid updates from the same sliding time window during back-propagation, and (3) fusing different computation cores to support the different grid sizes needed by the color and density branches of Instant-3D algorithm. Extensive experiments validate the effectiveness of Instant-3D, achieving a large training time reduction of 41x - 248x while maintaining the same reconstruction quality. Excitingly, Instant-3D has enabled instant 3D reconstruction for AR/VR, requiring a reconstruction time of only 1.6 seconds per scene and meeting the AR/VR power consumption constraint of 1.9 W.Comment: Accepted by ISCA'2
    corecore