92 research outputs found
MixRT: Mixed Neural Representations For Real-Time NeRF Rendering
Neural Radiance Field (NeRF) has emerged as a leading technique for novel
view synthesis, owing to its impressive photorealistic reconstruction and
rendering capability. Nevertheless, achieving real-time NeRF rendering in
large-scale scenes has presented challenges, often leading to the adoption of
either intricate baked mesh representations with a substantial number of
triangles or resource-intensive ray marching in baked representations. We
challenge these conventions, observing that high-quality geometry, represented
by meshes with substantial triangles, is not necessary for achieving
photorealistic rendering quality. Consequently, we propose MixRT, a novel NeRF
representation that includes a low-quality mesh, a view-dependent displacement
map, and a compressed NeRF model. This design effectively harnesses the
capabilities of existing graphics hardware, thus enabling real-time NeRF
rendering on edge devices. Leveraging a highly-optimized WebGL-based rendering
framework, our proposed MixRT attains real-time rendering speeds on edge
devices (over 30 FPS at a resolution of 1280 x 720 on a MacBook M1 Pro laptop),
better rendering quality (0.2 PSNR higher in indoor scenes of the Unbounded-360
datasets), and a smaller storage size (less than 80% compared to
state-of-the-art methods).Comment: Accepted by 3DV'24. Project Page: https://licj15.github.io/MixRT
Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning
Transfer learning leverages feature representations of deep neural networks
(DNNs) pretrained on source tasks with rich data to empower effective
finetuning on downstream tasks. However, the pretrained models are often
prohibitively large for delivering generalizable representations, which limits
their deployment on edge devices with constrained resources. To close this gap,
we propose a new transfer learning pipeline, which leverages our finding that
robust tickets can transfer better, i.e., subnetworks drawn with properly
induced adversarial robustness can win better transferability over vanilla
lottery ticket subnetworks. Extensive experiments and ablation studies validate
that our proposed transfer learning pipeline can achieve enhanced
accuracy-sparsity trade-offs across both diverse downstream tasks and sparsity
patterns, further enriching the lottery ticket hypothesis.Comment: Accepted by DAC 202
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Boosting the task accuracy of tiny neural networks (TNNs) has become a
fundamental challenge for enabling the deployments of TNNs on edge devices
which are constrained by strict limitations in terms of memory, computation,
bandwidth, and power supply. To this end, we propose a framework called
NetDistiller to boost the achievable accuracy of TNNs by treating them as
sub-networks of a weight-sharing teacher constructed by expanding the number of
channels of the TNN. Specifically, the target TNN model is jointly trained with
the weight-sharing teacher model via (1) gradient surgery to tackle the
gradient conflicts between them and (2) uncertainty-aware distillation to
mitigate the overfitting of the teacher model. Extensive experiments across
diverse tasks validate NetDistiller's effectiveness in boosting TNNs'
achievable accuracy over state-of-the-art methods. Our code is available at
https://github.com/GATECH-EIC/NetDistiller
Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification
Short Text Classification (STC) is crucial for processing and comprehending
the brief but substantial content prevalent on contemporary digital platforms.
The STC encounters difficulties in grasping semantic and syntactic intricacies,
an issue that is apparent in traditional pre-trained language models. Although
Graph Convolutional Networks enhance performance by integrating external
knowledge bases, these methods are limited by the quality and extent of the
knowledge applied. Recently, the emergence of Large Language Models (LLMs) and
Chain-of-Thought (CoT) has significantly improved the performance of complex
reasoning tasks. However, some studies have highlighted the limitations of
their application in fundamental NLP tasks. Consequently, this study sought to
employ CoT to investigate the capabilities of LLMs in STC tasks. This study
introduces Quartet Logic: A Four-Step Reasoning (QLFR) framework. This
framework primarily incorporates Syntactic and Semantic Enrichment CoT,
effectively decomposing the STC task into four distinct steps: (i) essential
concept identification, (ii) common-sense knowledge retrieval, (iii) text
rewriting, and (iv) classification. This elicits the inherent knowledge and
abilities of LLMs to address the challenges in STC. Surprisingly, we found that
QLFR can also improve the performance of smaller models. Therefore, we
developed a CoT-Driven Multi-task learning (QLFR-CML) method to facilitate the
knowledge transfer from LLMs to smaller models. Extensive experimentation
across six short-text benchmarks validated the efficacy of the proposed
methods. Notably, QLFR achieved state-of-the-art performance on all datasets,
with significant improvements, particularly on the Ohsumed and TagMyNews
datasets
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Vision Transformers (ViTs) have shown impressive performance but still
require a high computation cost as compared to convolutional neural networks
(CNNs), one reason is that ViTs' attention measures global similarities and
thus has a quadratic complexity with the number of input tokens. Existing
efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g.,
Performer), which sacrifice ViTs' capabilities of capturing either global or
local context. In this work, we ask an important research question: Can ViTs
learn both global and local context while being more efficient during
inference? To this end, we propose a framework called Castling-ViT, which
trains ViTs using both linear-angular attention and masked softmax-based
quadratic attention, but then switches to having only linear angular attention
during ViT inference. Our Castling-ViT leverages angular kernels to measure the
similarities between queries and keys via spectral angles. And we further
simplify it with two techniques: (1) a novel linear-angular attention
mechanism: we decompose the angular kernels into linear terms and high-order
residuals, and only keep the linear terms; and (2) we adopt two parameterized
modules to approximate high-order residuals: a depthwise convolution and an
auxiliary masked softmax attention to help learn both global and local
information, where the masks for softmax attention are regularized to gradually
become zeros and thus incur no overhead during ViT inference. Extensive
experiments and ablation studies on three tasks consistently validate the
effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher
accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on
COCO detection under comparable FLOPs, as compared to ViTs with vanilla
softmax-based attentions.Comment: CVPR 202
i-FlatCam: A 253 FPS, 91.49 J/Frame Ultra-Compact Intelligent Lensless Camera for Real-Time and Efficient Eye Tracking in VR/AR
We present a first-of-its-kind ultra-compact intelligent camera system,
dubbed i-FlatCam, including a lensless camera with a computational (Comp.)
chip. It highlights (1) a predict-then-focus eye tracking pipeline for boosted
efficiency without compromising the accuracy, (2) a unified compression scheme
for single-chip processing and improved frame rate per second (FPS), and (3)
dedicated intra-channel reuse design for depth-wise convolutional layers
(DW-CONV) to increase utilization. i-FlatCam demonstrates the first eye
tracking pipeline with a lensless camera and achieves 3.16 degrees of accuracy,
253 FPS, 91.49 J/Frame, and 6.7mm x 8.9mm x 1.2mm camera form factor,
paving the way for next-generation Augmented Reality (AR) and Virtual Reality
(VR) devices.Comment: Accepted by VLSI 202
- …