46 research outputs found

    Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers

    Full text link
    Transformers have achieved great success in machine learning applications. Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root Mean Square Normalization (RMSNorm), play a critical role in accelerating and stabilizing the training of Transformers. While LayerNorm recenters and rescales input vectors, RMSNorm only rescales the vectors by their RMS value. Despite being more computationally efficient, RMSNorm may compromise the representation ability of Transformers. There is currently no consensus regarding the preferred normalization technique, as some models employ LayerNorm while others utilize RMSNorm, especially in recent large language models. It is challenging to convert Transformers with one normalization to the other type. While there is an ongoing disagreement between the two normalization types, we propose a solution to unify two mainstream Transformer architectures, Pre-LN and Pre-RMSNorm Transformers. By removing the inherent redundant mean information in the main branch of Pre-LN Transformers, we can reduce LayerNorm to RMSNorm, achieving higher efficiency. We further propose the Compressed RMSNorm (CRMSNorm) and Pre-CRMSNorm Transformer based on a lossless compression of the zero-mean vectors. We formally establish the equivalence of Pre-LN, Pre-RMSNorm, and Pre-CRMSNorm Transformer variants in both training and inference. It implies that Pre-LN Transformers can be substituted with Pre-(C)RMSNorm counterparts at almost no cost, offering the same arithmetic functionality along with free efficiency improvement. Experiments demonstrate that we can reduce the training and inference time of Pre-LN Transformers by up to 10%.Comment: 15 pages, 5 tables, code available at https://github.com/ZixuanJiang/pre-rmsnorm-transforme

    A compact butterfly-style silicon photonic-electronic neural chip for hardware-efficient deep learning

    Full text link
    The optical neural network (ONN) is a promising hardware platform for next-generation neurocomputing due to its high parallelism, low latency, and low energy consumption. Previous ONN architectures are mainly designed for general matrix multiplication (GEMM), leading to unnecessarily large area cost and high control complexity. Here, we move beyond classical GEMM-based ONNs and propose an optical subspace neural network (OSNN) architecture, which trades the universality of weight representation for lower optical component usage, area cost, and energy consumption. We devise a butterfly-style photonic-electronic neural chip to implement our OSNN with up to 7x fewer trainable optical components compared to GEMM-based ONNs. Additionally, a hardware-aware training framework is provided to minimize the required device programming precision, lessen the chip area, and boost the noise robustness. We experimentally demonstrate the utility of our neural chip in practical image recognition tasks, showing that a measured accuracy of 94.16% can be achieved in hand-written digit recognition tasks with 3-bit weight programming precision.Comment: 17 pages,5 figure

    ADEPT: Automatic Differentiable DEsign of Photonic Tensor Cores

    Full text link
    Photonic tensor cores (PTCs) are essential building blocks for optical artificial intelligence (AI) accelerators based on programmable photonic integrated circuits. PTCs can achieve ultra-fast and efficient tensor operations for neural network (NN) acceleration. Current PTC designs are either manually constructed or based on matrix decomposition theory, which lacks the adaptability to meet various hardware constraints and device specifications. To our best knowledge, automatic PTC design methodology is still unexplored. It will be promising to move beyond the manual design paradigm and "nurture" photonic neurocomputing with AI and design automation. Therefore, in this work, for the first time, we propose a fully differentiable framework, dubbed ADEPT, that can efficiently search PTC designs adaptive to various circuit footprint constraints and foundry PDKs. Extensive experiments show superior flexibility and effectiveness of the proposed ADEPT framework to explore a large PTC design space. On various NN models and benchmarks, our searched PTC topology outperforms prior manually-designed structures with competitive matrix representability, 2-30x higher footprint compactness, and better noise robustness, demonstrating a new paradigm in photonic neural chip design. The code of ADEPT is available at https://github.com/JeremieMelo/ADEPT using the https://github.com/JeremieMelo/pytorch-onn (TorchONN) library.Comment: Accepted to ACM/IEEE Design Automation Conference (DAC), 202

    NH2+ implantations induced superior hemocompatibility of carbon nanotubes

    Get PDF
    NH(2)(+) implantation was performed on multiwalled carbon nanotubes (MWCNTs) prepared by chemical vapor deposition. The hemocompatibility of MWCNTs and NH(2)(+)-implanted MWCNTs was evaluated based on in vitro hemolysis, platelet adhesion, and kinetic-clotting tests. Compared with MWCNTs, NH(2)(+)-implanted MWCNTs displayed more perfect platelets and red blood cells in morphology, lower platelet adhesion rate, lower hemolytic rate, and longer kinetic blood-clotting time. NH(2)(+)-implanted MWCNTs with higher fluency of 1 × 10(16) ions/cm(2) led to the best thromboresistance, hence desired hemocompatibility. Fourier transfer infrared and X-ray photoelectron spectroscopy analyses showed that NH(2)(+) implantation caused the cleavage of some pendants and the formation of some new N-containing functional groups. These results were responsible for the enhanced hemocompatibility of NH(2)(+)-implanted MWCNTs

    DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

    Full text link
    The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have demonstrated promising results for convolutional neural network (CNN) workloads that only require weight-static linear operations. However, they fail to efficiently support Transformer architectures with attention operations due to the lack of ability to process dynamic full-range tensor multiplication. In this work, we propose a customized high-performance and energy-efficient photonic Transformer accelerator, DOTA. To overcome the fundamental limitation of existing ONNs, we introduce a novel photonic tensor core, consisting of a crossbar array of interference-based optical vector dot-product engines, that supports highly-parallel, dynamic, and full-range matrix-matrix multiplication. Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a >10x latency reduction compared to prior photonic accelerators, and delivers over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared to the electronic Transformer accelerator. Our work highlights the immense potential of photonic computing for efficient hardware accelerators, particularly for advanced machine learning workloads.Comment: The short version is accepted by Next-Gen AI System Workshop at MLSys 202

    Excitability and Threshold Mechanism for Enhanced Neuronal Response Induced by Inhibition Preceding Excitation

    No full text
    Postinhibitory facilitation (PIF) of neural firing presents a paradoxical phenomenon that the inhibitory effect induces enhancement instead of reduction of the firing activity, which plays important roles in sound location of the auditory nervous system, awaited theoretical explanations. In the present paper, excitability and threshold mechanism for the PIF phenomenon is presented in the Morris-Lecar model with type I, II, and III excitabilities. Firstly, compared with the purely excitatory stimulations applied to the steady state, the inhibitory preceding excitatory stimulation to form pairs induces the firing rate increased for type II and III excitabilities instead of type I excitability, when the interval between the inhibitory and excitatory stimulation within each pair is suitable. Secondly, the threshold mechanism for the PIF phenomenon is acquired. For type II and III excitabilities, the inhibitory stimulation induces subthreshold oscillations around the steady state. During the middle and ending phase of the ascending part and the beginning phase of the descending part within a period of the subthreshold oscillations, the threshold to evoke an action potential by an excitatory stimulation becomes weaker, which is the cause for the PIF phenomenon. Last, a theoretical estimation for the range of the interval between the inhibitory and excitatory stimulation for the PIF phenomenon is acquired, which approximates half of the intrinsic period of the subthreshold oscillations for the relatively strong stimulations and becomes narrower for the relatively weak stimulations. The interval for the PIF phenomenon is much shorter for type III excitability, which is closer to the experiment observation, due to the shorter period of the subthreshold oscillations. The results present the excitability and threshold mechanism for the PIF phenomenon, which provide comprehensive and deep explanations to the PIF phenomenon

    Efficient Channel-Temporal Attention for Boosting RF Fingerprinting

    No full text
    In recent years, Deep Convolutional Neural Networks (DCNNs) have been widely used to solve Radio Frequency (RF) fingerprinting task. DCNNs are capable of learning the proper convolution kernels driven by data and directly extracting RF fingerprints from raw In-phase/Quadratur (IQ) data which are brought by variations or minor flaws in transmitters' circuits, enabling the identification of a specific transmitter. One of the main challenges in employing this sort of technology is how to optimize model design so that it can automatically learn discriminative RF fingerprints and show robustness to changes in environmental factors. To this end, this paper proposes ECTAttention, an Efficient Channel-Temporal Attention block that can be used to enhance the feature learning capability of DCNNs. ECTAttention has two parallel branches. On the one hand, it automatically mines the correlation between channels through channel attention to discover and enhance important convolution kernels. On the other hand, it can recalibrate the feature map through temporal attention. ECTAttention has good flexibility and high efficiency, and can be combined with existing DCNNs to effectively enhance their feature learning ability on the basis of increasing only a small amount of computational consumption, so as to achieve high precision of RF fingerprinting. Our experimental results show that ResNet enhanced by ECTAttention can identify 10 USRP X310 SDRs with an accuracy of 97.5%, and achieve a recognition accuracy of 91.9% for 56 actual ADS-B signal sources under unconstrained acquisition environment

    Surface crack treatment of concrete via nano-modified microbial carbonate precipitation

    No full text
    Abstract As a new concrete crack patching technology, microbial self-healing slurries offer favourable characteristics including non-pollution, ecological sustainability and good compatibility with concrete. In this paper, a nano-sio2-modified microbial bacteria liquid, combined with sodium alginate and polyvinyl alcohol, was used to prepare a nano-modified microbial self-healing slurry. This slurry was used to coat concrete under negative pressure in order to verify its restoration effect, and the micromorphology of the resulting microbial mineralization products was observed. The results revealed that patching the concrete using the nano-modified microbial slurry significantly improved its permeability, and increased its carbonization resistance by three times in comparison with the control group. Through a combination of Scanning electron microscopy (SEM) and X-ray diffraction (XRD) observation, it was determined that the microbial mineralization reaction products were mainly calcite crystals, which, integrated with the nano-sio2, sodium alginate and polyvinyl alcohol at the microscopic level, filled the internal pores of concrete, thus improving its durability
    corecore