59 research outputs found
NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants
Tiny deep learning has attracted increasing attention driven by the
substantial demand for deploying deep learning on numerous intelligent
Internet-of-Things devices. However, it is still challenging to unleash tiny
deep learning's full potential on both large-scale datasets and downstream
tasks due to the under-fitting issues caused by the limited model capacity of
tiny neural networks (TNNs). To this end, we propose a framework called
NetBooster to empower tiny deep learning by augmenting the architectures of
TNNs via an expansion-then-contraction strategy. Extensive experiments show
that NetBooster consistently outperforms state-of-the-art tiny deep learning
solutions
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Self-supervised learning (SSL) for rich speech representations has achieved
empirical success in low-resource Automatic Speech Recognition (ASR) and other
speech processing tasks, which can mitigate the necessity of a large amount of
transcribed speech and thus has driven a growing demand for on-device ASR and
other speech processing. However, advanced speech SSL models have become
increasingly large, which contradicts the limited on-device resources. This gap
could be more severe in multilingual/multitask scenarios requiring
simultaneously recognizing multiple languages or executing multiple speech
processing tasks. Additionally, strongly overparameterized speech SSL models
tend to suffer from overfitting when being finetuned on low-resource speech
corpus. This work aims to enhance the practical usage of speech SSL models
towards a win-win in both enhanced efficiency and alleviated overfitting via
our proposed S-Router framework, which for the first time discovers that
simply discarding no more than 10\% of model weights via only finetuning model
connections of speech SSL models can achieve better accuracy over standard
weight finetuning on downstream speech processing tasks. More importantly,
S-Router can serve as an all-in-one technique to enable (1) a new
finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a
state-of-the-art ASR pruning technique, and (4) a new tool to quantitatively
analyze the learned speech representation. We believe S-Router has provided
a new perspective for practical deployment of speech SSL models. Our codes are
available at: https://github.com/GATECH-EIC/S3-Router.Comment: Accepted at NeurIPS 202
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
The remarkable capabilities and intricate nature of Artificial Intelligence
(AI) have dramatically escalated the imperative for specialized AI
accelerators. Nonetheless, designing these accelerators for various AI
workloads remains both labor- and time-intensive. While existing design
exploration and automation tools can partially alleviate the need for extensive
human involvement, they still demand substantial hardware expertise, posing a
barrier to non-experts and stifling AI accelerator development. Motivated by
the astonishing potential of large language models (LLMs) for generating
high-quality content in response to human language instructions, we embark on
this work to examine the possibility of harnessing LLMs to automate AI
accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework
intended to democratize AI accelerator design by leveraging human natural
languages instead of domain-specific languages. Specifically, we first perform
an in-depth investigation into LLMs' limitations and capabilities for AI
accelerator design, thus aiding our understanding of our current position and
garnering insights into LLM-powered automated AI accelerator design.
Furthermore, drawing inspiration from the above insights, we develop a
framework called GPT4AIGChip, which features an automated demo-augmented
prompt-generation pipeline utilizing in-context learning to guide LLMs towards
creating high-quality AI accelerator design. To our knowledge, this work is the
first to demonstrate an effective pipeline for LLM-powered automated AI
accelerator generation. Accordingly, we anticipate that our insights and
framework can serve as a catalyst for innovations in next-generation
LLM-powered design automation tools.Comment: Accepted by ICCAD 202
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Vision Transformers (ViTs) have achieved state-of-the-art performance on
various vision tasks. However, ViTs' self-attention module is still arguably a
major bottleneck, limiting their achievable hardware efficiency. Meanwhile,
existing accelerators dedicated to NLP Transformers are not optimal for ViTs.
This is because there is a large difference between ViTs and NLP Transformers:
ViTs have a relatively fixed number of input tokens, whose attention maps can
be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers
need to handle input sequences of varying numbers of tokens and rely on
on-the-fly predictions of dynamic sparse attention patterns for each input to
achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated
algorithm and accelerator co-design framework dubbed ViTCoD for accelerating
ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the
attention maps to have either denser or sparser fixed patterns for regularizing
two levels of workloads without hurting the accuracy, largely reducing the
attention computations while leaving room for alleviating the remaining
dominant data movements; on top of that, we further integrate a lightweight and
learnable auto-encoder module to enable trading the dominant high-cost data
movements for lower-cost computations. On the hardware level, we develop a
dedicated accelerator to simultaneously coordinate the enforced denser/sparser
workloads and encoder/decoder engines for boosted hardware utilization.
Extensive experiments and ablation studies validate that ViTCoD largely reduces
the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x,
86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs,
and prior-art Transformer accelerators SpAtten and Sanger under an attention
sparsity of 90%, respectively.Comment: Accepted to HPCA 202
DPHL: A DIA Pan-human Protein Mass Spectrometry Library for Robust Biomarker Discovery
To address the increasing need for detecting and validating protein biomarkers in clinical specimens, mass spectrometry (MS)-based targeted proteomic techniques, including the selected reaction monitoring (SRM), parallel reaction monitoring (PRM), and massively parallel data-independent acquisition (DIA), have been developed. For optimal performance, they require the fragment ion spectra of targeted peptides as prior knowledge. In this report, we describe a MS pipeline and spectral resource to support targeted proteomics studies for human tissue samples. To build the spectral resource, we integrated common open-source MS computational tools to assemble a freely accessible computational workflow based on Docker. We then applied the workflow to generate DPHL, a comprehensive DIA pan-human library, from 1096 data-dependent acquisition (DDA) MS raw files for 16 types of cancer samples. This extensive spectral resource was then applied to a proteomic study of 17 prostate cancer (PCa) patients. Thereafter, PRM validation was applied to a larger study of 57 PCa patients and the differential expression of three proteins in prostate tumor was validated. As a second application, the DPHL spectral resource was applied to a study consisting of plasma samples from 19 diffuse large B cell lymphoma (DLBCL) patients and 18 healthy control subjects. Differentially expressed proteins between DLBCL patients and healthy control subjects were detected by DIA-MS and confirmed by PRM. These data demonstrate that the DPHL supports DIA and PRM MS pipelines for robust protein biomarker discovery. DPHL is freely accessible at https://www.iprox.org/page/project.html?id=IPX0001400000
- …