83 research outputs found
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Vision Transformers (ViTs) have shown impressive performance and have become
a unified backbone for multiple vision tasks. But both attention and
multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense
multiplications, resulting in costly training and inference. To this end, we
propose to reparameterize the pre-trained ViT with a mixture of multiplication
primitives, e.g., bitwise shifts and additions, towards a new type of
multiplication-reduced model, dubbed , which aims for
end-to-end inference speedups on GPUs without the need of training from
scratch. Specifically, all among queries, keys, and values
are reparameterized by additive kernels, after mapping queries and keys to
binary codes in Hamming space. The remaining MLPs or linear layers are then
reparameterized by shift kernels. We utilize TVM to implement and optimize
those customized kernels for practical hardware deployment on GPUs. We find
that such a reparameterization on (quadratic or linear) attention maintains
model accuracy, while inevitably leading to accuracy drops when being applied
to MLPs. To marry the best of both worlds, we further propose a new mixture of
experts (MoE) framework to reparameterize MLPs by taking multiplication or its
primitives as experts, e.g., multiplication and shift, and designing a new
latency-aware load-balancing loss. Such a loss helps to train a generic router
for assigning a dynamic amount of input tokens to different experts according
to their latency. In principle, the faster experts run, the larger amount of
input tokens are assigned. Extensive experiments consistently validate the
effectiveness of our proposed ShiftAddViT, achieving up to
\textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy
savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202
NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants
Tiny deep learning has attracted increasing attention driven by the
substantial demand for deploying deep learning on numerous intelligent
Internet-of-Things devices. However, it is still challenging to unleash tiny
deep learning's full potential on both large-scale datasets and downstream
tasks due to the under-fitting issues caused by the limited model capacity of
tiny neural networks (TNNs). To this end, we propose a framework called
NetBooster to empower tiny deep learning by augmenting the architectures of
TNNs via an expansion-then-contraction strategy. Extensive experiments show
that NetBooster consistently outperforms state-of-the-art tiny deep learning
solutions
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Boosting the task accuracy of tiny neural networks (TNNs) has become a
fundamental challenge for enabling the deployments of TNNs on edge devices
which are constrained by strict limitations in terms of memory, computation,
bandwidth, and power supply. To this end, we propose a framework called
NetDistiller to boost the achievable accuracy of TNNs by treating them as
sub-networks of a weight-sharing teacher constructed by expanding the number of
channels of the TNN. Specifically, the target TNN model is jointly trained with
the weight-sharing teacher model via (1) gradient surgery to tackle the
gradient conflicts between them and (2) uncertainty-aware distillation to
mitigate the overfitting of the teacher model. Extensive experiments across
diverse tasks validate NetDistiller's effectiveness in boosting TNNs'
achievable accuracy over state-of-the-art methods. Our code is available at
https://github.com/GATECH-EIC/NetDistiller
Experimental study on the mechanical controlling factors of fracture plugging strength for lost circulation control in shale gas reservoir
The geological conditions of shale reservoir present several unique challenges. These include the extensive development of multi-scale fractures, frequent losses during horizontal drilling, low success rates in plugging, and a tendency for the fracture plugging zone to experience repeated failures. Extensive analysis suggests that the weakening of the mechanical properties of shale fracture surfaces is the primary factor responsible for reducing the bearing capacity of the fracture plugging zone. To assess the influence of oil-based environments on the degradation of mechanical properties in shale fracture surfaces, rigorous mechanical property tests were conducted on shale samples subsequent to their exposure to various substances, including white oil, lye, and the filtrate of oil-based drilling fluid. The experimental results demonstrate that the average values of the elastic modulus and indwelling hardness of dry shale are 24.30 GPa and 0.64 GPa, respectively. Upon immersion in white oil, these values decrease to 22.42 GPa and 0.63 GPa, respectively. Additionally, the depth loss rates of dry shale and white oil-soaked shale are determined to be 57.12% and 61.96%, respectively, indicating an increased degree of fracturing on the shale surface. White oil, lye, and the filtrate of oil-based drilling fluid have demonstrated their capacity to reduce the friction coefficient of the shale surface. The average friction coefficients measured for white oil, lye, and oil-based drilling fluid are 0.80, 0.72, and 0.76, respectively, reflecting their individual weakening effects. Furthermore, it should be noted that the contact mode between the plugging materials and the fracture surface can also lead to a reduction in the friction coefficient between them. To enhance the bearing capacity of the plugging zone, a series of plugging experiments were conducted utilizing high-strength materials, high-friction materials, and nanomaterials. The selection of these materials was based on the understanding of the weakened mechanical properties of the fracture surface. The experimental results demonstrate that the reduced mechanical properties of the fracture surface can diminish the pressure-bearing capacity of the plugging zone. However, the implementation of high-strength materials, high-friction materials, and nanomaterials effectively enhances the pressure-bearing capacity of the plugging zone. The research findings offer valuable insights and guidance towards improving the sealing pressure capacity of shale fractures and effectively increasing the success rate of leakage control measures during shale drilling and completion. © 2023 The Author
Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design
Novel view synthesis is an essential functionality for enabling immersive
experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for
which generalizable Neural Radiance Fields (NeRFs) have gained increasing
popularity thanks to their cross-scene generalization capability. Despite their
promise, the real-device deployment of generalizable NeRFs is bottlenecked by
their prohibitive complexity due to the required massive memory accesses to
acquire scene features, causing their ray marching process to be
memory-bounded. To this end, we propose Gen-NeRF, an algorithm-hardware
co-design framework dedicated to generalizable NeRF acceleration, which for the
first time enables real-time generalizable NeRFs. On the algorithm side,
Gen-NeRF integrates a coarse-then-focus sampling strategy, leveraging the fact
that different regions of a 3D scene contribute differently to the rendered
pixel, to enable sparse yet effective sampling. On the hardware side, Gen-NeRF
highlights an accelerator micro-architecture to maximize the data reuse
opportunities among different rays by making use of their epipolar geometric
relationship. Furthermore, our Gen-NeRF accelerator features a customized
dataflow to enhance data locality during point-to-hardware mapping and an
optimized scene feature storage strategy to minimize memory bank conflicts.
Extensive experiments validate the effectiveness of our proposed Gen-NeRF
framework in enabling real-time and generalizable novel view synthesis.Comment: Accepted by ISCA 202
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Vision Transformers (ViTs) have shown impressive performance but still
require a high computation cost as compared to convolutional neural networks
(CNNs), one reason is that ViTs' attention measures global similarities and
thus has a quadratic complexity with the number of input tokens. Existing
efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g.,
Performer), which sacrifice ViTs' capabilities of capturing either global or
local context. In this work, we ask an important research question: Can ViTs
learn both global and local context while being more efficient during
inference? To this end, we propose a framework called Castling-ViT, which
trains ViTs using both linear-angular attention and masked softmax-based
quadratic attention, but then switches to having only linear angular attention
during ViT inference. Our Castling-ViT leverages angular kernels to measure the
similarities between queries and keys via spectral angles. And we further
simplify it with two techniques: (1) a novel linear-angular attention
mechanism: we decompose the angular kernels into linear terms and high-order
residuals, and only keep the linear terms; and (2) we adopt two parameterized
modules to approximate high-order residuals: a depthwise convolution and an
auxiliary masked softmax attention to help learn both global and local
information, where the masks for softmax attention are regularized to gradually
become zeros and thus incur no overhead during ViT inference. Extensive
experiments and ablation studies on three tasks consistently validate the
effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher
accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on
COCO detection under comparable FLOPs, as compared to ViTs with vanilla
softmax-based attentions.Comment: CVPR 202
Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction
Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable for
immersive Augmented and Virtual Reality (AR/VR) applications, but achieving
instant (i.e., < 5 seconds) on-device NeRF training remains a challenge. In
this work, we first identify the inefficiency bottleneck: the need to
interpolate NeRF embeddings up to 200,000 times from a 3D embedding grid during
each training iteration. To alleviate this, we propose Instant-3D, an
algorithm-hardware co-design acceleration framework that achieves instant
on-device NeRF training. Our algorithm decomposes the embedding grid
representation in terms of color and density, enabling computational redundancy
to be squeezed out by adopting different (1) grid sizes and (2) update
frequencies for the color and density branches. Our hardware accelerator
further reduces the dominant memory accesses for embedding grid interpolation
by (1) mapping multiple nearby points' memory read requests into one during the
feed-forward process, (2) merging embedding grid updates from the same sliding
time window during back-propagation, and (3) fusing different computation cores
to support the different grid sizes needed by the color and density branches of
Instant-3D algorithm. Extensive experiments validate the effectiveness of
Instant-3D, achieving a large training time reduction of 41x - 248x while
maintaining the same reconstruction quality. Excitingly, Instant-3D has enabled
instant 3D reconstruction for AR/VR, requiring a reconstruction time of only
1.6 seconds per scene and meeting the AR/VR power consumption constraint of 1.9
W.Comment: Accepted by ISCA'2
- …