13 research outputs found

    TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference

    Full text link
    Automated co-design of machine learning models and evaluation hardware is critical for efficiently deploying such models at scale. Despite the state-of-the-art performance of transformer models, they are not yet ready for execution on resource-constrained hardware platforms. High memory requirements and low parallelizability of the transformer architecture exacerbate this problem. Recently-proposed accelerators attempt to optimize the throughput and energy consumption of transformer models. However, such works are either limited to a one-sided search of the model architecture or a restricted set of off-the-shelf devices. Furthermore, previous works only accelerate model inference and not training, which incurs substantially higher memory and compute resources, making the problem even more challenging. To address these limitations, this work proposes a dynamic training framework, called DynaProp, that speeds up the training process and reduces memory consumption. DynaProp is a low-overhead pruning method that prunes activations and gradients at runtime. To effectively execute this method on hardware for a diverse set of transformer architectures, we propose ELECTOR, a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models with high accuracy on the given task and minimize latency, energy consumption, and chip area. The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair while incurring 5.2×\times lower latency and 3.0×\times lower energy consumption

    EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

    Full text link
    Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8×\times smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0×\times lower energy, and 10.8×\times lower peak power draw compared to an off-the-shelf GPU

    BREATHE: Second-Order Gradients and Heteroscedastic Emulation based Design Space Exploration

    Full text link
    Researchers constantly strive to explore larger and more complex search spaces in various scientific studies and physical experiments. However, such investigations often involve sophisticated simulators or time-consuming experiments that make exploring and observing new design samples challenging. Previous works that target such applications are typically sample-inefficient and restricted to vector search spaces. To address these limitations, this work proposes a constrained multi-objective optimization (MOO) framework, called BREATHE, that searches not only traditional vector-based design spaces but also graph-based design spaces to obtain best-performing graphs. It leverages second-order gradients and actively trains a heteroscedastic surrogate model for sample-efficient optimization. In a single-objective vector optimization application, it leads to 64.1% higher performance than the next-best baseline, random forest regression. In graph-based search, BREATHE outperforms the next-best baseline, i.e., a graphical version of Gaussian-process-based Bayesian optimization, with up to 64.9% higher performance. In a MOO task, it achieves up to 21.9×\times higher hypervolume than the state-of-the-art method, multi-objective Bayesian optimization (MOBOpt). BREATHE also outperforms the baseline methods on most standard MOO benchmark applications

    Anisotropic lattice expansion determined during flash sintering of BiFeO3 by in-situ energy-dispersive X-ray diffraction

    Get PDF
    BiFeO3 has a Curie temperature (TC) of 825 °C, making it difficult to sinter using conventional methods while maintaining the purity of the material, as unavoidably secondary phases appear at temperatures above Tc. Flash sintering is a relatively new technique that saves time and energy compared to other sintering methods. BiFeO3 was flash sintered at 500 °C to achieve 90% densification. In-situ energy dispersive X-ray diffraction (EDXRD) revealed that the material did not undergo any phase transformation, having been sintered well below the TC. Interestingly, anisotropic lattice expansion in the material was observed when the sample was exposed to the electric field.U.S. Office of Naval Research (ONR) N00014-10-1- 042, N00014-17-1-2087, Sub 4104-78982U.S. Department of Energy DE-AC02-06CH1135

    Field-induced p-n transition in yttria-stabilized zirconia

    Get PDF
    Oxide ion conducting yttria-stabilised zirconia ceramics show the onset of electronic conduction under a small bias voltage. Compositions with a high yttria content undergo a transition from p-type to n-type behavior at voltages in the range 2.4 to 10 V, which also depends on oxygen partial pressure. Surface reactions have a direct influence on bulk electronic conductivities, with possible implications for voltage-induced flash phenomena and resistive switching

    Microstructure and microchemistry of flash sintered K0.5Na0.5NbO3

    No full text
    Flash sintering experiments were performed, for the first time, on sodium potassium niobate (KNN) ceramics. A theoretical density of 94% was achieved in 30 s under 250V/cm electric-field at 990°C. These conditions are ³100°C lower and faster than the conventional sintering conditions. Grains tended to grow after 30 s. flash sintering duration under constant electric-field. Detailed microstructural and chemical investigations of the sample showed that there was inhomogenous Na, K distribution and it resembles a coreshell structure where K is more in the shell and Na is more in the core region. The inhomogenous distribution of Na and K was correlated with the doubling of the unit cell within the grain along [002] direction. Compositional equilibrium is achieved after a heat treatment at 1000°C for 4 h. The compositional variations appeared to have been linked to grain boundary melting during flash and consequent recrystallization as the sample cooled
    corecore