16 research outputs found
Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design
Novel view synthesis is an essential functionality for enabling immersive
experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for
which generalizable Neural Radiance Fields (NeRFs) have gained increasing
popularity thanks to their cross-scene generalization capability. Despite their
promise, the real-device deployment of generalizable NeRFs is bottlenecked by
their prohibitive complexity due to the required massive memory accesses to
acquire scene features, causing their ray marching process to be
memory-bounded. To this end, we propose Gen-NeRF, an algorithm-hardware
co-design framework dedicated to generalizable NeRF acceleration, which for the
first time enables real-time generalizable NeRFs. On the algorithm side,
Gen-NeRF integrates a coarse-then-focus sampling strategy, leveraging the fact
that different regions of a 3D scene contribute differently to the rendered
pixel, to enable sparse yet effective sampling. On the hardware side, Gen-NeRF
highlights an accelerator micro-architecture to maximize the data reuse
opportunities among different rays by making use of their epipolar geometric
relationship. Furthermore, our Gen-NeRF accelerator features a customized
dataflow to enhance data locality during point-to-hardware mapping and an
optimized scene feature storage strategy to minimize memory bank conflicts.
Extensive experiments validate the effectiveness of our proposed Gen-NeRF
framework in enabling real-time and generalizable novel view synthesis.Comment: Accepted by ISCA 202
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Self-supervised learning (SSL) for rich speech representations has achieved
empirical success in low-resource Automatic Speech Recognition (ASR) and other
speech processing tasks, which can mitigate the necessity of a large amount of
transcribed speech and thus has driven a growing demand for on-device ASR and
other speech processing. However, advanced speech SSL models have become
increasingly large, which contradicts the limited on-device resources. This gap
could be more severe in multilingual/multitask scenarios requiring
simultaneously recognizing multiple languages or executing multiple speech
processing tasks. Additionally, strongly overparameterized speech SSL models
tend to suffer from overfitting when being finetuned on low-resource speech
corpus. This work aims to enhance the practical usage of speech SSL models
towards a win-win in both enhanced efficiency and alleviated overfitting via
our proposed S-Router framework, which for the first time discovers that
simply discarding no more than 10\% of model weights via only finetuning model
connections of speech SSL models can achieve better accuracy over standard
weight finetuning on downstream speech processing tasks. More importantly,
S-Router can serve as an all-in-one technique to enable (1) a new
finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a
state-of-the-art ASR pruning technique, and (4) a new tool to quantitatively
analyze the learned speech representation. We believe S-Router has provided
a new perspective for practical deployment of speech SSL models. Our codes are
available at: https://github.com/GATECH-EIC/S3-Router.Comment: Accepted at NeurIPS 202
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
The remarkable capabilities and intricate nature of Artificial Intelligence
(AI) have dramatically escalated the imperative for specialized AI
accelerators. Nonetheless, designing these accelerators for various AI
workloads remains both labor- and time-intensive. While existing design
exploration and automation tools can partially alleviate the need for extensive
human involvement, they still demand substantial hardware expertise, posing a
barrier to non-experts and stifling AI accelerator development. Motivated by
the astonishing potential of large language models (LLMs) for generating
high-quality content in response to human language instructions, we embark on
this work to examine the possibility of harnessing LLMs to automate AI
accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework
intended to democratize AI accelerator design by leveraging human natural
languages instead of domain-specific languages. Specifically, we first perform
an in-depth investigation into LLMs' limitations and capabilities for AI
accelerator design, thus aiding our understanding of our current position and
garnering insights into LLM-powered automated AI accelerator design.
Furthermore, drawing inspiration from the above insights, we develop a
framework called GPT4AIGChip, which features an automated demo-augmented
prompt-generation pipeline utilizing in-context learning to guide LLMs towards
creating high-quality AI accelerator design. To our knowledge, this work is the
first to demonstrate an effective pipeline for LLM-powered automated AI
accelerator generation. Accordingly, we anticipate that our insights and
framework can serve as a catalyst for innovations in next-generation
LLM-powered design automation tools.Comment: Accepted by ICCAD 202
New Spectrally Constrained Sequence Sets With Optimal Periodic Cross-Correlation
Spectrally constrained sequences (SCSs) play an important role in modern communication and radar systems operating over non-contiguous spectrum. Despite numerous research attempts over the past years, very few works are known on the constructions of optimal SCSs with low cross-correlations. In this paper, we address such a major problem by introducing a unifying framework to construct unimodular SCS families using circular Florentine rectangles (CFRs) and interleaving techniques. By leveraging the uniform power allocation in the frequency domain for all the admissible carriers (a necessary condition for beating the existing periodic correlation lower bound of SCSs), we present a tighter correlation lower bound and show that it is achievable by our proposed SCS families including multiple SCS sets with zero correlation zone properties
Low Ambiguity Zone: Theoretical Bounds and Doppler-Resilient Sequence Design in Integrated Sensing and Communication Systems
In radar sensing and communications, designing Doppler resilient sequences (DRSs) with low ambiguity function for delay over the entire signal duration and Doppler shift over the entire signal bandwidth is an extremely difficult task. However, in practice, the Doppler frequency range is normally much smaller than the bandwidth of the transmitted signal, and it is relatively easy to attain quasi-synchronization for delays far less than the entire signal duration. Motivated by this observation, we propose a new concept called low ambiguity zone (LAZ) which is a small area of the corresponding ambiguity function of interest defined by the certain Doppler frequency and delay. Such an LAZ will reduce to a zero ambiguity zone (ZAZ) if the maximum ambiguity values of interest are zero. In this paper, we derive a set of theoretical bounds on periodic LAZ/ZAZ of unimodular DRSs with and without spectral constraints, which include the existing bounds on periodic global ambiguity function as special cases. These bounds may be used as theoretical design guidelines to measure the optimality of sequences against Doppler effect. We then introduce four optimal constructions of DRSs with respect to the derived ambiguity lower bounds based on some algebraic tools such as characters over finite field and cyclic difference sets
CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Nuclear detection, segmentation and morphometric profiling are essential in
helping us further understand the relationship between histology and patient
outcome. To drive innovation in this area, we setup a community-wide challenge
using the largest available dataset of its kind to assess nuclear segmentation
and cellular composition. Our challenge, named CoNIC, stimulated the
development of reproducible algorithms for cellular recognition with real-time
result inspection on public leaderboards. We conducted an extensive
post-challenge analysis based on the top-performing models using 1,658
whole-slide images of colon tissue. With around 700 million detected nuclei per
model, associated features were used for dysplasia grading and survival
analysis, where we demonstrated that the challenge's improvement over the
previous state-of-the-art led to significant boosts in downstream performance.
Our findings also suggest that eosinophils and neutrophils play an important
role in the tumour microevironment. We release challenge models and WSI-level
results to foster the development of further methods for biomarker discovery