Search CORE

1,395 research outputs found

Domain-aware Genetic Algorithms for Hardware and Mapping Optimization for Efficient DNN Acceleration

Author: Kao Sheng-Chun
Publication venue: Georgia Institute of Technology
Publication date: 25/08/2022
Field of study

The proliferation of AI across a variety of domains (vision, language, speech, recommendations, games) has led to the rise of domain-specific accelerators for deep learning. At design-time, these accelerators carefully architect the on-chip dataflow to maximize data reuse (over space and time) and size the hardware resources (PEs and buffers) to maximize performance and energy-efficiency, while meeting the chip’s area and power targets. At compile-time, the target Deep Neural Network (DNN) model is mapped over the accelerator. The mapping refers to tiling the computation and data (i.e., tensors) and scheduling them over the PEs and scratchpad buffers respectively, while honoring the microarchitectural constraints (number of PEs, buffer sizes, and dataflow). The design-space of valid hardware resource assignments for a given dataflow and the valid mappings for a given hardware is extremely large (~O(10^24)) per layer for state-of-the-art DNN models today. This makes exhaustive searches infeasible. Unfortunately, there can be orders of magnitude performance and energy-efficiency differences between an optimal and sub-optimal choice, making these decisions a crucial part of the entire design process. Moreover, manual tuning by domain experts become unprecedentedly challenged due to increased irregularity (due to neural architecture search) and sparsity of DNN models. This necessitate the existence of Map Space Exploration (MSE). In this thesis, our goal is to deliver a deep analysis of the MSE for DNN accelerators, propose different techniques to improve MSE, and generalize the MSE framework to a wider landscape (from mapping to HW-mapping co-exploration, from single-accelerator to multi-accelerator scheduling). As part of it, we discuss the correlation between hardware flexibility and the formed map space, formalized the map space representation by four mapping axes: tile, order, parallelism, and shape. Next, we develop dedicated exploration operators for these axes and use genetic algorithm framework to converge the solution. Next, we develop "sparsity-aware" technique to enable sparsity consideration in MSE and a "warm-start" technique to solve the search speed challenge commonly seen across learning-based search algorithms. Finally, we extend out MSE to support hardware and map space co-exploration and multi-accelerator scheduling.Ph.D

Scholarly Materials And Research @ Georgia Tech

Demystifying Map Space Exploration for NPUs

Author: Kao Sheng-Chun
Krishna Tushar
Parashar Angshuman
Tsai Po-An
Publication venue
Publication date: 07/10/2022
Field of study

Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator. It is known to be extremely computationally expensive, and there has been active research looking at both heuristics and learning-based methods to make the problem computationally tractable. However, while there are dozens of mappers out there (all empirically claiming to find better mappings than others), the research community lacks systematic insights on how different search techniques navigate the map-space and how different mapping axes contribute to the accelerator's performance and efficiency. Such insights are crucial to developing mapping frameworks for emerging DNNs that are increasingly irregular (due to neural architecture search) and sparse, making the corresponding map spaces much more complex. In this work, rather than proposing yet another mapper, we do a first-of-its-kind apples-to-apples comparison of search techniques leveraged by different mappers. Next, we extract the learnings from our study and propose two new techniques that can augment existing mappers -- warm-start and sparsity-aware -- that demonstrate speedups, scalability, and robustness across diverse DNN models

arXiv.org e-Print Archive

FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks

Author: Agrawal Gaurav
Kao Sheng-Chun
Krishna Tushar
Subramanian Suvinay
Yazdanbakhsh Amir
Publication venue
Publication date: 03/12/2021
Field of study

Attention mechanisms form the backbone of state-of-the-art machine learning models for a variety of tasks. Deploying them on deep neural network (DNN) accelerators, however, is prohibitively challenging especially under long sequences, as this work identifies. This is due to operators in attention layers exhibiting limited reuse opportunities and quadratic growth in memory footprint, leading to severe memory-boundedness. To address this, we introduce a new attention-tailored dataflow, termed FLAT, which identifies fusion opportunities within the attention layer, and implements an on-chip memory-aware interleaved execution and tiling mechanism. FLAT increases the effective memory bandwidth by efficiently utilizing the high-bandwidth, low-capacity on-chip buffer and thus achieves better run time and compute resource utilization. In our evaluation, FLAT achieves 1.94x and 1.76x speedup and 49% and 42% of energy reduction comparing to baseline execution over state-of-the-art edge and cloud accelerators

arXiv.org e-Print Archive

DSpace@MIT

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

Author: Agrawal Shivani
Evci Utku
Kao Sheng-Chun
Krishna Tushar
Subramanian Suvinay
Yazdanbakhsh Amir
Publication venue
Publication date: 15/09/2022
Field of study

Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern accelerators. Particularly, N:M sparsity is attractive because there are already hardware accelerator architectures that can leverage certain forms of N:M structured sparsity to yield higher compute-efficiency. In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs). Building upon this study, we propose two new decay-based pruning methods, namely "pruning mask decay" and "sparse structure decay". Our evaluations indicate that these proposed methods consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on a Transformer-based model for a translation task. The increase in the accuracy of the sparse model using the new training recipes comes at the cost of marginal increase in the total training compute (FLOPs).Comment: 11 pages, 2 figures, and 9 tables. Published at the ICML Workshop on Sparsity in Neural Networks Advancing Understanding and Practice, 2022. First two authors contributed equall

arXiv.org e-Print Archive

The Liquid Sensor Using Thin Film Bulk Acoustic Resonator with C-Axis Tilted AlN Films

Author: Chien-Chuan Cheng
Chun-Hung Yang
Kuo-Sheng Kao
Wei-Tsai Chang
Ying-Chung Chen
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Dual-mode thin film bulk acoustic resonator (TFBAR) devices are fabricated with c-axis tilted AlN films. To fabricate dual-mode TFBAR devices, the off-axis RF magnetron sputtering method for the growth of tilted piezoelectric AlN thin films is adopted. In this report, the AlN thin films are deposited with tilting angles of 15° and 23°. The frequency response of the TFBAR device with 23° tilted AlN thin film is measured to reveal its ability to provide dual-mode resonance. The sensitivities of the longitudinal and shear modes to mass loading are calculated to be 2295 Hz cm2/ng and 1363 Hz cm2/ng with the mechanical quality factors of 480 and 287, respectively. The sensitivities of the longitudinal and shear modes are calculated to be 0 and 15 Hz cm2/μg for liquid loading

Crossref

Directory of Open Access Journals

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

Author: Agrawal Shivani
Bambhaniya Abhimanyu Rajeshkumar
Evci Utku
Kao Sheng-Chun
Krishna Tushar
Subramanian Suvinay
Yazdanbakhsh Amir
Publication venue
Publication date: 07/02/2024
Field of study

N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions (

\sim

50\%). Nonetheless, performance of models trained using these approaches tends to decline when confronted with high-sparsity regions (

>

80\%). In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions. We demonstrate that the significant factor contributing to this disparity is the presence of elevated levels of induced noise in the gradient magnitudes. To mitigate this undesirable effect, we employ decay mechanisms to progressively restrict the flow of gradients towards pruned elements. Our approach improves the model quality by up to 2

\%

and 5

\%

in vision and language models at high sparsity regime, respectively. We also evaluate the trade-off between model accuracy and training compute cost in terms of FLOPs. At iso-training FLOPs, our method yields better performance compared to conventional sparse training recipes, exhibiting an accuracy improvement of up to 2

\%

. The source code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity.Comment: 18 pages, 8 figures, 17 tables. Code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsit

arXiv.org e-Print Archive