Search CORE

13 research outputs found

TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference

Author: Jha Niraj K.
Tuli Shikhar
Publication venue
Publication date: 26/03/2023
Field of study

Automated co-design of machine learning models and evaluation hardware is critical for efficiently deploying such models at scale. Despite the state-of-the-art performance of transformer models, they are not yet ready for execution on resource-constrained hardware platforms. High memory requirements and low parallelizability of the transformer architecture exacerbate this problem. Recently-proposed accelerators attempt to optimize the throughput and energy consumption of transformer models. However, such works are either limited to a one-sided search of the model architecture or a restricted set of off-the-shelf devices. Furthermore, previous works only accelerate model inference and not training, which incurs substantially higher memory and compute resources, making the problem even more challenging. To address these limitations, this work proposes a dynamic training framework, called DynaProp, that speeds up the training process and reduces memory consumption. DynaProp is a low-overhead pruning method that prunes activations and gradients at runtime. To effectively execute this method on hardware for a diverse set of transformer architectures, we propose ELECTOR, a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models with high accuracy on the given task and minimize latency, energy consumption, and chip area. The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair while incurring 5.2

\times

lower latency and 3.0

\times

lower energy consumption

arXiv.org e-Print Archive

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

Author: Jha Niraj K.
Tuli Shikhar
Publication venue
Publication date: 23/03/2023
Field of study

Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8

\times

smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0

\times

lower energy, and 10.8

\times

lower peak power draw compared to an off-the-shelf GPU

arXiv.org e-Print Archive

BREATHE: Second-Order Gradients and Heteroscedastic Emulation based Design Space Exploration

Author: Jha Niraj K.
Tuli Shikhar
Publication venue
Publication date: 16/08/2023
Field of study

Researchers constantly strive to explore larger and more complex search spaces in various scientific studies and physical experiments. However, such investigations often involve sophisticated simulators or time-consuming experiments that make exploring and observing new design samples challenging. Previous works that target such applications are typically sample-inefficient and restricted to vector search spaces. To address these limitations, this work proposes a constrained multi-objective optimization (MOO) framework, called BREATHE, that searches not only traditional vector-based design spaces but also graph-based design spaces to obtain best-performing graphs. It leverages second-order gradients and actively trains a heteroscedastic surrogate model for sample-efficient optimization. In a single-objective vector optimization application, it leads to 64.1% higher performance than the next-best baseline, random forest regression. In graph-based search, BREATHE outperforms the next-best baseline, i.e., a graphical version of Gaussian-process-based Bayesian optimization, with up to 64.9% higher performance. In a MOO task, it achieves up to 21.9

\times

higher hypervolume than the state-of-the-art method, multi-objective Bayesian optimization (MOBOpt). BREATHE also outperforms the baseline methods on most standard MOO benchmark applications

arXiv.org e-Print Archive

Anisotropic lattice expansion determined during flash sintering of BiFeO3 by in-situ energy-dispersive X-ray diffraction

Author: Charalambous Harry
Gil González Eva
Jha Shikhar K.
Okasinski John
Perejón Pazo Antonio
Pérez Maqueda Luis Allan
Tsakalakos Thomas
Wassel Mary Anne B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

BiFeO3 has a Curie temperature (TC) of 825 °C, making it difficult to sinter using conventional methods while maintaining the purity of the material, as unavoidably secondary phases appear at temperatures above Tc. Flash sintering is a relatively new technique that saves time and energy compared to other sintering methods. BiFeO3 was flash sintered at 500 °C to achieve 90% densification. In-situ energy dispersive X-ray diffraction (EDXRD) revealed that the material did not undergo any phase transformation, having been sintered well below the TC. Interestingly, anisotropic lattice expansion in the material was observed when the sample was exposed to the electric field.U.S. Office of Naval Research (ONR) N00014-10-1- 042, N00014-17-1-2087, Sub 4104-78982U.S. Department of Energy DE-AC02-06CH1135

idUS. Depósito de Investigación Universidad de Sevilla

Field-induced p-n transition in yttria-stabilized zirconia

Author: A Karakuscu
B Niu
BK Barnes
C Schmerbauch
CEJ Dancer
D Yadav
DB Struckov
DH Kwon
G Cabouro
H Beltran
H Yoshida
HM Smith
J Narayan
JC M’Peko
JH Park
JM Tour
JSC Francis
K Jha Shikhar
K Teraud
L Chua
L Chua
L Pellegrino
M Cologna
M Cologna
M Guo
M Jovaní
M Prades
M Yu
N Masó
N Masó
N Masó
N Morisaji
O Guillom
P Krzysteczko
R Raj
R Raj
RI Todd
S Grasso
S Grasso
S Grasso
S Jo
SH Bae
SH Jo
X Vendrell
X Vendrell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Oxide ion conducting yttria-stabilised zirconia ceramics show the onset of electronic conduction under a small bias voltage. Compositions with a high yttria content undergo a transition from p-type to n-type behavior at voltages in the range 2.4 to 10 V, which also depends on oxygen partial pressure. Surface reactions have a direct influence on bulk electronic conductivities, with possible implications for voltage-induced flash phenomena and resistive switching

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

White Rose Research Online

Recommended from our members

DINI: Data Imputation Using Neural Inversion for Edge Applications

Author: Jha Niraj K
Tuli Shikhar
Publication venue
Publication date: 23/11/2022
Field of study

The edge computing paradigm has recently drawn significant attention from industry and academia. Due to the advantages in quality-of-service metrics, namely, latency, bandwidth, energy efficiency, privacy, and security, deploying artificial intelligence (AI) models at the network edge has attracted widespread interest. Edge-AI has seen applications in diverse domains that involve large amounts of data. However, poor dataset quality plagues this compute regime owing to numerous data corruption sources, including missing data. As such systems are increasingly being deployed in mission-critical applications, mitigating the effects of corrupted data becomes important. In this work, we propose a strategy based on data imputation using neural inversion, DINI. It trains a surrogate model and runs data imputation in an interleaved fashion. Unlike previous works, DINI is a model-agnostic framework applicable to diverse deep learning architectures. DINI outperforms state-of-the-art methods by at least 10.7% in average imputation error. Applying DINI to mission-critical applications can increase prediction accuracy to up to 99% (F1 score of 0.99), resulting in significant gains compared to baseline methods

Princeton University Open Access Repository

PubMed Central

Recommended from our members

FlexiBERT: Are Current Transformer Architectures too Homogeneous and Rigid?

Author: Dedhia Bhishma
Jha Niraj K
Tuli Shikhar
Tuli Shreshth
Publication venue
Publication date: 01/01/2022
Field of study

The existence of a plethora of language models makes the problem of selecting the best one for a custom task challenging. Most state-of-the-art methods leverage transformer-based models (e.g., BERT) or their variants. Training such models and exploring their hyperparameter space, however, is computationally expensive. Prior work proposes several neural architecture search (NAS) methods that employ performance predictors (e.g., surrogate models) to address this issue; however, analysis has been limited to homogeneous models that use fixed dimensionality throughout the network. This leads to sub-optimal architectures. To address this limitation, we propose a suite of heterogeneous and flexible models, namely FlexiBERT, that have varied encoder layers with a diverse set of possible operations and different hidden dimensions. For better-posed surrogate modeling in this expanded design space, we propose a new graph-similarity-based embedding scheme. We also propose a novel NAS policy, called BOSHNAS, that leverages this new scheme, Bayesian modeling, and second-order optimization, to quickly train and use a neural surrogate model to converge to the optimal architecture. A comprehensive set of experiments shows that the proposed policy, when applied to the FlexiBERT design space, pushes the performance frontier upwards compared to traditional models. FlexiBERT-Mini, one of our proposed models, has 3% fewer parameters than BERT-Mini and achieves 8.9% higher GLUE score. A FlexiBERT model with equivalent performance as the best homogeneous model achieves 2.6× smaller size. FlexiBERT-Large, another proposed model, achieves state-of-the-art results, outperforming the baseline models by at least 5.7% on the GLUE benchmark

Princeton University Open Access Repository

Recommended from our members

Beyond flash sintering in 3 mol % yttria stabilized zirconia

Author: Jean-Marie LEBRUN
Kalvis TERAUDS
Rishi RAJ
Shikhar K. JHA
Publication venue: 'Ceramic Society of Japan'
Publication date: 01/01/2016
Field of study

A flash sintering experiment can be carried out by applying an electric field and heating the specimen at a constant rate. The flash event occurs at a specific temperature that depends on the strength of the electric field. Alternatively, the furnace can be held at a constant temperature and the voltage applied as a step function; after an incubation time there is a highly non-linear rise in conductivity. This incubation step is called Stage I. The non-linearity is constrained by switching the power supply to current control. This short transient, during which the sample sinters nearly instantaneously, is the second stage. Under current-control, the (essentially dense) sample remains in a highly excited state indefinitely, which we call Stage III. In this state, the samples are often brightly electroluminescent emitting a green glow; unusual phase transformations occur and the rate of chemical reactions is greatly enhanced. We infer that these manifestations are evidence of a defect catastrophe that includes unusual generation of electrons, holes and point defects, which can produce sintering, electronic conductivity, electroluminescence, and phase transformations, all at the same time. We hypothesize that both Joule heating and electric field are necessary for this catastrophe

CU Scholar Institutional Repository

Crossref

Recommended from our members

CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework

Author: Jha Niraj K
Li Chia-Hao
Sharma Ritvik
Tuli Shikhar
Publication venue
Publication date: 01/01/2022
Field of study

Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks either explore a limited search space or employ suboptimal exploration techniques for simultaneous design decision investigations of the ML model and the accelerator. Furthermore, training the ML model and simulating the accelerator performance is computationally expensive. To address these limitations, this work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench. It comprises two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators. CNNBench leverages an advanced search technique, BOSHNAS, to efficiently train a neural heteroscedastic surrogate model to converge to an optimal CNN architecture by employing second-order gradients. AccelBench performs cycle-accurate simulations for diverse accelerator architectures in a vast design space. With the proposed co-design method, called BOSHCODE, our best CNN-accelerator pair achieves 1.4% higher accuracy on the CIFAR-10 dataset compared to the state-of-the-art pair while enabling 59.1% lower latency and 60.8% lower energy consumption. On the ImageNet dataset, it achieves 3.7% higher Top1 accuracy at 43.8% lower latency and 11.2% lower energy consumption. CODEBench outperforms the state-of-the-art framework, i.e., Auto-NBA, by achieving 1.5% higher accuracy and 34.7× higher throughput while enabling 11.0× lower energy-delay product (EDP) and 4.0× lower chip area on CIFAR-10

Princeton University Open Access Repository

Microstructure and microchemistry of flash sintered K0.5Na0.5NbO3

Author: Corapcioglu Gulcan
Gulgun Mehmet Ali
Gülgün Mehmet Ali
Jha Shikhar K.
Kisslinger Kim
Raj Rishi
Sturm Saso
Çorapcıoğlu Gülcan
Publication venue: 'Ceramic Society of Japan'
Publication date: 01/01/2016
Field of study

Flash sintering experiments were performed, for the first time, on sodium potassium niobate (KNN) ceramics. A theoretical density of 94% was achieved in 30 s under 250V/cm electric-field at 990°C. These conditions are ³100°C lower and faster than the conventional sintering conditions. Grains tended to grow after 30 s. flash sintering duration under constant electric-field. Detailed microstructural and chemical investigations of the sample showed that there was inhomogenous Na, K distribution and it resembles a coreshell structure where K is more in the shell and Na is more in the core region. The inhomogenous distribution of Na and K was correlated with the doubling of the unit cell within the grain along [002] direction. Compositional equilibrium is achieved after a heat treatment at 1000°C for 4 h. The compositional variations appeared to have been linked to grain boundary melting during flash and consequent recrystallization as the sample cooled

Crossref

Sabanci University Research Database