23 research outputs found

    Shared Microexponents: A Little Shifting Goes a Long Way

    Full text link
    This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems

    Microscaling Data Formats for Deep Learning

    Full text link
    Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe

    DropBack : continuous pruning during deep neural network training

    No full text
    In recent years, neural networks have regained popularity in a variety of fields such as image recognition and speech transcription. As deep neural networks grow more popular for solving everyday tasks, deployment on small embedded devices — such as phones — is becoming increasingly popular. Moreover, many applications — such as face recognition or health applications — require personalization, which means that networks must be retrained after they have been deployed. Because today’s state-of-the-art networks are too large to fit on mobile devices and exceed mobile device power envelopes, techniques such as pruning and quantization have been developed to allow pre-trained networks to be shrunk by about an order of magnitude. However, they all assume that the network is first fully trained off-line on datacenter-class GPUs, then pruned in a post-processing step, and only then deployed to the mobile device. In this thesis, we introduce DropBack, a technique that significantly reduces the storage and computation required during both inference and training. In contrast to existing pruning schemes, which retain the weights with the largest values and set the rest to zero, DropBack identifies the weights that have changed the most, and recomputes the original initialization values for all other weights. This means that only the most important weights must be stored in off-chip memory both during inference and training, reducing off-chip memory accesses (responsible for a majority of the power usage) by up to 72×. Crucially, networks pruned using DropBack maintain high accuracy even for challenging network architectures: indeed, on modern, compact network architectures such as Densenet and WRN-28-10, DropBack outperforms the current state-of-the- art pruning techniques in both accuracy and off-chip memory storage required for weights. On the CIFAR-10 dataset, we observe 5× reduction in weights on an already 9×-reduced VGG-16 network, which we call VGG-S, and 4.5× on Densenet and WRN-28-10 — all with zero or negligible accuracy loss — or 19×, 27×, and 36×, respectively, with a minor impact on accuracy. When the recomputed initial weights are decayed to zero, the weight memory footprint of WRN-28-10 can be reduced up to 72×.Applied Science, Faculty ofElectrical and Computer Engineering, Department ofGraduat

    Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models

    Full text link
    Advances in language modeling have paved the way for novel human-AI co-writing experiences. This paper explores how varying levels of scaffolding from large language models (LLMs) shape the co-writing process. Employing a within-subjects field experiment with a Latin square design, we asked participants (N=131) to respond to argumentative writing prompts under three randomly sequenced conditions: no AI assistance (control), next-sentence suggestions (low scaffolding), and next-paragraph suggestions (high scaffolding). Our findings reveal a U-shaped impact of scaffolding on writing quality and productivity (words/time). While low scaffolding did not significantly improve writing quality or productivity, high scaffolding led to significant improvements, especially benefiting non-regular writers and less tech-savvy users. No significant cognitive burden was observed while using the scaffolded writing tools, but a moderate decrease in text ownership and satisfaction was noted. Our results have broad implications for the design of AI-powered writing tools, including the need for personalized scaffolding mechanisms.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/192479/1/Dhillon et al. 2024 online.pdfDescription of Dhillon et al. 2024 online.pdf : Final VersionSEL

    Terahertz Photogalvanics in Twisted Bilayer Graphene Close to the Second Magic Angle

    No full text
    We report on the observation of photogalvanic effects in tBLG with a twist angle of 0.6 degrees. We show that excitation of the tBLG bulk causes a photocurrent, whose sign and magnitude are controlled by the orientation of the radiation electric field and the photon helicity. The observed photocurrent provides evidence for the reduction of the point group symmetry in low twist-angle tBLG to the lowest possible one. The developed theory shows that the current is formed by asymmetric scattering in gyrotropic tBLG. We also detected the photogalvanic current formed in the vicinity of the edges. For both bulk and edge photocurrents, we demonstrate the emergence of pronounced oscillations upon variation of the gate voltage. The gate voltages associated with the oscillations correlate with peaks in resistance measurements. These are well explained by interband transitions between a multitude of isolated bands in tBLG

    Terahertz spin ratchet effect in magnetic metamaterials

    Get PDF
    We report on spin ratchet currents driven by terahertz radiation electric fields in a Co/Pt magnetic metamaterial formed by triangle-shaped holes forming an antidots lattice and subjected to an external magnetic field applied perpendicularly to the metal film plane. We show that for a radiation wavelength substantially larger than the period of the antidots array the radiation causes a polarization-independent spin-polarized ratchet current. The current is generated by the periodic asymmetric radiation intensity distribution caused by the near-field diffraction at the edges of the antidots, which induces spatially inhomogeneous periodic electron gas heating, and a phase-shifted periodic asymmetric electrostatic force. The developed microscopic theory shows that the magnetization of the Co/Pt film results in a spin ratchet current caused by both the anomalous Hall and the anomalous Nernst effects. Additionally, we observed a polarization-dependent trigonal spin photocurrent, which is caused by the scattering of electrons at the antidot boundaries resulting in a spin-polarized current due to the magnetization. Microscopic theory of these effects reveals that the trigonal photocurrent is generated at the boundaries of the triangle antidots, whereas the spin ratchet is generated due to the spatially periodic temperature gradient over the whole film. This difference causes substantially different hysteresis widths of these two currents
    corecore