13 research outputs found
TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference
Automated co-design of machine learning models and evaluation hardware is
critical for efficiently deploying such models at scale. Despite the
state-of-the-art performance of transformer models, they are not yet ready for
execution on resource-constrained hardware platforms. High memory requirements
and low parallelizability of the transformer architecture exacerbate this
problem. Recently-proposed accelerators attempt to optimize the throughput and
energy consumption of transformer models. However, such works are either
limited to a one-sided search of the model architecture or a restricted set of
off-the-shelf devices. Furthermore, previous works only accelerate model
inference and not training, which incurs substantially higher memory and
compute resources, making the problem even more challenging. To address these
limitations, this work proposes a dynamic training framework, called DynaProp,
that speeds up the training process and reduces memory consumption. DynaProp is
a low-overhead pruning method that prunes activations and gradients at runtime.
To effectively execute this method on hardware for a diverse set of transformer
architectures, we propose ELECTOR, a framework that simulates transformer
inference and training on a design space of accelerators. We use this simulator
in conjunction with the proposed co-design technique, called TransCODE, to
obtain the best-performing models with high accuracy on the given task and
minimize latency, energy consumption, and chip area. The obtained
transformer-accelerator pair achieves 0.3% higher accuracy than the
state-of-the-art pair while incurring 5.2 lower latency and 3.0
lower energy consumption
EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms
Automated design of efficient transformer models has recently attracted
significant attention from industry and academia. However, most works only
focus on certain metrics while searching for the best-performing transformer
architecture. Furthermore, running traditional, complex, and large transformer
models on low-compute edge platforms is a challenging problem. In this work, we
propose a framework, called ProTran, to profile the hardware performance
measures for a design space of transformer architectures and a diverse set of
edge devices. We use this profiler in conjunction with the proposed co-design
technique to obtain the best-performing models that have high accuracy on the
given task and minimize latency, energy consumption, and peak power draw to
enable edge deployment. We refer to our framework for co-optimizing accuracy
and hardware performance measures as EdgeTran. It searches for the best
transformer model and edge device pair. Finally, we propose GPTran, a
multi-stage block-level grow-and-prune post-processing step that further
improves accuracy in a hardware-aware manner. The obtained transformer model is
2.8 smaller and has a 0.8% higher GLUE score than the baseline
(BERT-Base). Inference with it on the selected edge device enables 15.0% lower
latency, 10.0 lower energy, and 10.8 lower peak power draw
compared to an off-the-shelf GPU
BREATHE: Second-Order Gradients and Heteroscedastic Emulation based Design Space Exploration
Researchers constantly strive to explore larger and more complex search
spaces in various scientific studies and physical experiments. However, such
investigations often involve sophisticated simulators or time-consuming
experiments that make exploring and observing new design samples challenging.
Previous works that target such applications are typically sample-inefficient
and restricted to vector search spaces. To address these limitations, this work
proposes a constrained multi-objective optimization (MOO) framework, called
BREATHE, that searches not only traditional vector-based design spaces but also
graph-based design spaces to obtain best-performing graphs. It leverages
second-order gradients and actively trains a heteroscedastic surrogate model
for sample-efficient optimization. In a single-objective vector optimization
application, it leads to 64.1% higher performance than the next-best baseline,
random forest regression. In graph-based search, BREATHE outperforms the
next-best baseline, i.e., a graphical version of Gaussian-process-based
Bayesian optimization, with up to 64.9% higher performance. In a MOO task, it
achieves up to 21.9 higher hypervolume than the state-of-the-art
method, multi-objective Bayesian optimization (MOBOpt). BREATHE also
outperforms the baseline methods on most standard MOO benchmark applications
Anisotropic lattice expansion determined during flash sintering of BiFeO3 by in-situ energy-dispersive X-ray diffraction
BiFeO3 has a Curie temperature (TC) of 825 °C, making it difficult to sinter using conventional methods while maintaining the purity of the material, as unavoidably secondary phases appear at temperatures above Tc. Flash sintering is a relatively new technique that saves time and energy compared to other sintering methods. BiFeO3 was flash sintered at 500 °C to achieve 90% densification. In-situ energy dispersive X-ray diffraction (EDXRD) revealed that the material did not undergo any phase transformation, having been sintered well below the TC. Interestingly, anisotropic lattice expansion in the material was observed when the sample was exposed to the electric field.U.S. Office of Naval Research (ONR) N00014-10-1- 042, N00014-17-1-2087, Sub 4104-78982U.S. Department of Energy DE-AC02-06CH1135
Field-induced p-n transition in yttria-stabilized zirconia
Oxide ion conducting yttria-stabilised zirconia ceramics show the onset of electronic conduction under a small bias voltage. Compositions with a high yttria content undergo a transition from p-type to n-type behavior at voltages in the range 2.4 to 10 V, which also depends on oxygen partial pressure. Surface reactions have a direct influence on bulk electronic conductivities, with possible implications for voltage-induced flash phenomena and resistive switching
Recommended from our members
DINI: Data Imputation Using Neural Inversion for Edge Applications
The edge computing paradigm has recently drawn significant attention from industry and academia. Due to the advantages in quality-of-service metrics, namely, latency, bandwidth, energy efficiency, privacy, and security, deploying artificial intelligence (AI) models at the network edge has attracted widespread interest. Edge-AI has seen applications in diverse domains that involve large amounts of data. However, poor dataset quality plagues this compute regime owing to numerous data corruption sources, including missing data. As such systems are increasingly being deployed in mission-critical applications, mitigating the effects of corrupted data becomes important. In this work, we propose a strategy based on data imputation using neural inversion, DINI. It trains a surrogate model and runs data imputation in an interleaved fashion. Unlike previous works, DINI is a model-agnostic framework applicable to diverse deep learning architectures. DINI outperforms state-of-the-art methods by at least 10.7% in average imputation error. Applying DINI to mission-critical applications can increase prediction accuracy to up to 99% (F1 score of 0.99), resulting in significant gains compared to baseline methods
Recommended from our members
FlexiBERT: Are Current Transformer Architectures too Homogeneous and Rigid?
The existence of a plethora of language models makes the problem of selecting the best one for a custom task challenging. Most state-of-the-art methods leverage transformer-based models (e.g., BERT) or their variants. Training such models and exploring their hyperparameter space, however, is computationally expensive. Prior work proposes several neural architecture search (NAS) methods that employ performance predictors (e.g., surrogate models) to address this issue; however, analysis has been limited to homogeneous models that use fixed dimensionality throughout the network. This leads to sub-optimal architectures. To address this limitation, we propose a suite of heterogeneous and flexible models, namely FlexiBERT, that have varied encoder layers with a diverse set of possible operations and different hidden dimensions. For better-posed surrogate modeling in this expanded design space, we propose a new graph-similarity-based embedding scheme. We also propose a novel NAS policy, called BOSHNAS, that leverages this new scheme, Bayesian modeling, and second-order optimization, to quickly train and use a neural surrogate model to converge to the optimal architecture. A comprehensive set of experiments shows that
the proposed policy, when applied to the FlexiBERT design space, pushes the performance frontier upwards compared to traditional models. FlexiBERT-Mini, one of our proposed models, has 3% fewer parameters than BERT-Mini and achieves 8.9% higher GLUE score. A FlexiBERT model with equivalent performance as the best homogeneous model achieves 2.6× smaller size. FlexiBERT-Large, another proposed model, achieves state-of-the-art results, outperforming the baseline models by at least 5.7% on the GLUE benchmark
Recommended from our members
Beyond flash sintering in 3 mol % yttria stabilized zirconia
A flash sintering experiment can be carried out by applying an electric field and heating the specimen at a constant rate. The flash event occurs at a specific temperature that depends on the strength of the electric field. Alternatively, the furnace can be held at a constant temperature and the voltage applied as a step function; after an incubation time there is a highly non-linear rise in conductivity. This incubation step is called Stage I. The non-linearity is constrained by switching the power supply to current control. This short transient, during which the sample sinters nearly instantaneously, is the second stage. Under current-control, the (essentially dense) sample remains in a highly excited state indefinitely, which we call Stage III. In this state, the samples are often brightly electroluminescent emitting a green glow; unusual phase transformations occur and the rate of chemical reactions is greatly enhanced. We infer that these manifestations are evidence of a defect catastrophe that includes unusual generation of electrons, holes and point defects, which can produce sintering, electronic conductivity, electroluminescence, and phase transformations, all at the same time. We hypothesize that both Joule heating and electric field are necessary for this catastrophe
Recommended from our members
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework
Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks either explore a limited search space or employ suboptimal exploration techniques for simultaneous design decision investigations of the ML model and the accelerator. Furthermore, training the ML model and simulating the accelerator performance is computationally expensive. To address these limitations, this work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench. It comprises two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators. CNNBench leverages an advanced search technique, BOSHNAS, to efficiently train a neural heteroscedastic surrogate model to converge to an optimal CNN architecture by employing second-order gradients. AccelBench performs cycle-accurate simulations for diverse accelerator architectures in a vast design space. With the proposed co-design method, called BOSHCODE, our best CNN-accelerator pair achieves 1.4% higher accuracy on the CIFAR-10 dataset compared to the state-of-the-art pair while enabling 59.1% lower latency and 60.8% lower energy consumption. On the ImageNet dataset, it achieves 3.7% higher Top1 accuracy at 43.8%
lower latency and 11.2% lower energy consumption. CODEBench outperforms the state-of-the-art framework, i.e., Auto-NBA, by achieving 1.5% higher accuracy and 34.7× higher throughput while enabling 11.0× lower energy-delay product (EDP) and 4.0× lower chip area on CIFAR-10
Microstructure and microchemistry of flash sintered K0.5Na0.5NbO3
Flash sintering experiments were performed, for the first time, on sodium potassium niobate (KNN) ceramics. A theoretical density of 94% was achieved in 30 s under 250V/cm electric-field at 990°C. These conditions are ³100°C lower and faster than the conventional sintering conditions. Grains tended to grow after 30 s. flash sintering duration under constant electric-field. Detailed microstructural and chemical investigations of the sample showed that there was inhomogenous Na, K distribution and it resembles a coreshell structure where K is more in the shell and Na is more in the core region. The inhomogenous distribution of Na and K was correlated with the doubling of the unit cell within the grain along [002] direction. Compositional equilibrium is achieved after a heat treatment at 1000°C for 4 h. The compositional variations appeared to have been linked to grain boundary melting during flash and consequent recrystallization as the sample cooled