16 research outputs found

    Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design

    Full text link
    Novel view synthesis is an essential functionality for enabling immersive experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for which generalizable Neural Radiance Fields (NeRFs) have gained increasing popularity thanks to their cross-scene generalization capability. Despite their promise, the real-device deployment of generalizable NeRFs is bottlenecked by their prohibitive complexity due to the required massive memory accesses to acquire scene features, causing their ray marching process to be memory-bounded. To this end, we propose Gen-NeRF, an algorithm-hardware co-design framework dedicated to generalizable NeRF acceleration, which for the first time enables real-time generalizable NeRFs. On the algorithm side, Gen-NeRF integrates a coarse-then-focus sampling strategy, leveraging the fact that different regions of a 3D scene contribute differently to the rendered pixel, to enable sparse yet effective sampling. On the hardware side, Gen-NeRF highlights an accelerator micro-architecture to maximize the data reuse opportunities among different rays by making use of their epipolar geometric relationship. Furthermore, our Gen-NeRF accelerator features a customized dataflow to enhance data locality during point-to-hardware mapping and an optimized scene feature storage strategy to minimize memory bank conflicts. Extensive experiments validate the effectiveness of our proposed Gen-NeRF framework in enabling real-time and generalizable novel view synthesis.Comment: Accepted by ISCA 202

    Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

    Full text link
    Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly large, which contradicts the limited on-device resources. This gap could be more severe in multilingual/multitask scenarios requiring simultaneously recognizing multiple languages or executing multiple speech processing tasks. Additionally, strongly overparameterized speech SSL models tend to suffer from overfitting when being finetuned on low-resource speech corpus. This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S3^3-Router framework, which for the first time discovers that simply discarding no more than 10\% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks. More importantly, S3^3-Router can serve as an all-in-one technique to enable (1) a new finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a state-of-the-art ASR pruning technique, and (4) a new tool to quantitatively analyze the learned speech representation. We believe S3^3-Router has provided a new perspective for practical deployment of speech SSL models. Our codes are available at: https://github.com/GATECH-EIC/S3-Router.Comment: Accepted at NeurIPS 202

    GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models

    Full text link
    The remarkable capabilities and intricate nature of Artificial Intelligence (AI) have dramatically escalated the imperative for specialized AI accelerators. Nonetheless, designing these accelerators for various AI workloads remains both labor- and time-intensive. While existing design exploration and automation tools can partially alleviate the need for extensive human involvement, they still demand substantial hardware expertise, posing a barrier to non-experts and stifling AI accelerator development. Motivated by the astonishing potential of large language models (LLMs) for generating high-quality content in response to human language instructions, we embark on this work to examine the possibility of harnessing LLMs to automate AI accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework intended to democratize AI accelerator design by leveraging human natural languages instead of domain-specific languages. Specifically, we first perform an in-depth investigation into LLMs' limitations and capabilities for AI accelerator design, thus aiding our understanding of our current position and garnering insights into LLM-powered automated AI accelerator design. Furthermore, drawing inspiration from the above insights, we develop a framework called GPT4AIGChip, which features an automated demo-augmented prompt-generation pipeline utilizing in-context learning to guide LLMs towards creating high-quality AI accelerator design. To our knowledge, this work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation. Accordingly, we anticipate that our insights and framework can serve as a catalyst for innovations in next-generation LLM-powered design automation tools.Comment: Accepted by ICCAD 202

    New Spectrally Constrained Sequence Sets With Optimal Periodic Cross-Correlation

    Get PDF
    Spectrally constrained sequences (SCSs) play an important role in modern communication and radar systems operating over non-contiguous spectrum. Despite numerous research attempts over the past years, very few works are known on the constructions of optimal SCSs with low cross-correlations. In this paper, we address such a major problem by introducing a unifying framework to construct unimodular SCS families using circular Florentine rectangles (CFRs) and interleaving techniques. By leveraging the uniform power allocation in the frequency domain for all the admissible carriers (a necessary condition for beating the existing periodic correlation lower bound of SCSs), we present a tighter correlation lower bound and show that it is achievable by our proposed SCS families including multiple SCS sets with zero correlation zone properties

    Low Ambiguity Zone: Theoretical Bounds and Doppler-Resilient Sequence Design in Integrated Sensing and Communication Systems

    Get PDF
    In radar sensing and communications, designing Doppler resilient sequences (DRSs) with low ambiguity function for delay over the entire signal duration and Doppler shift over the entire signal bandwidth is an extremely difficult task. However, in practice, the Doppler frequency range is normally much smaller than the bandwidth of the transmitted signal, and it is relatively easy to attain quasi-synchronization for delays far less than the entire signal duration. Motivated by this observation, we propose a new concept called low ambiguity zone (LAZ) which is a small area of the corresponding ambiguity function of interest defined by the certain Doppler frequency and delay. Such an LAZ will reduce to a zero ambiguity zone (ZAZ) if the maximum ambiguity values of interest are zero. In this paper, we derive a set of theoretical bounds on periodic LAZ/ZAZ of unimodular DRSs with and without spectral constraints, which include the existing bounds on periodic global ambiguity function as special cases. These bounds may be used as theoretical design guidelines to measure the optimality of sequences against Doppler effect. We then introduce four optimal constructions of DRSs with respect to the derived ambiguity lower bounds based on some algebraic tools such as characters over finite field and cyclic difference sets

    CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

    Get PDF
    Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery

    Zero-Difference Balanced Function Derived from Fermat Quotients and Its Applications

    No full text

    Some Notes on Pseudorandom Binary Sequences Derived from Fermat-Euler Quotients

    No full text
    corecore