41 research outputs found

    Root Mean Square Layer Normalization

    Get PDF
    Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models. Source code is available at https://github.com/bzhangGo/rmsnorm.Comment: NeurIPS 201

    The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

    Full text link
    Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.Comment: Work in progres

    NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services

    Full text link
    Large language models (LLMs) have triggered tremendous success to empower daily life by generative information, and the personalization of LLMs could further contribute to their applications due to better alignment with human intents. Towards personalized generative services, a collaborative cloud-edge methodology sounds promising, as it facilitates the effective orchestration of heterogeneous distributed communication and computing resources. In this article, after discussing the pros and cons of several candidate cloud-edge collaboration techniques, we put forward NetGPT to capably deploy appropriate LLMs at the edge and the cloud in accordance with their computing capacity. In addition, edge LLMs could efficiently leverage location-based information for personalized prompt completion, thus benefiting the interaction with cloud LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently, we highlight substantial essential changes required for a native artificial intelligence (AI) network architecture towards NetGPT, with special emphasis on deeper integration of communications and computing resources and careful calibration of logical AI workflow. Furthermore, we demonstrate several by-product benefits of NetGPT, given edge LLM's astonishing capability to predict trends and infer intents, which possibly leads to a unified solution for intelligent network management \& orchestration. In a nutshell, we argue that NetGPT is a promising native-AI network architecture beyond provisioning personalized generative services

    A Meta-Learning Perspective on Transformers for Causal Language Modeling

    Full text link
    The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data

    Functional Invariants to Watermark Large Transformers

    Full text link
    The rapid growth of transformer-based models increases the concerns about their integrity and ownership insurance. Watermarking addresses this issue by embedding a unique identifier into the model, while preserving its performance. However, most existing approaches require to optimize the weights to imprint the watermark signal, which is not suitable at scale due to the computational cost. This paper explores watermarks with virtually no computational cost, applicable to a non-blind white-box setting (assuming access to both the original and watermarked networks). They generate functionally equivalent copies by leveraging the models' invariance, via operations like dimension permutations or scaling/unscaling. This enables to watermark models without any change in their outputs and remains stealthy. Experiments demonstrate the effectiveness of the approach and its robustness against various model transformations (fine-tuning, quantization, pruning), making it a practical solution to protect the integrity of large models.Comment: Published at ICASSP 2024. Webpage at https://pierrefdz.github.io/publications/invariancewm

    LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

    Full text link
    While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features. To address the SQA challenge on LLMs, we initially curated the free-form and open-ended LibriSQA dataset from Librispeech, comprising Part I with natural conversational formats and Part II encompassing multiple-choice questions followed by answers and analytical segments. Both parts collectively include 107k SQA pairs that cover various topics. Given the evident paucity of existing speech-text LLMs, we propose a lightweight, end-to-end framework to execute the SQA task on the LibriSQA, witnessing significant results. By reforming ASR into the SQA format, we further substantiate our framework's capability in handling ASR tasks. Our empirical findings bolster the LLMs' aptitude for aligning and comprehending multimodal information, paving the way for the development of universal multimodal LLMs. The dataset and demo can be found at https://github.com/ZihanZhaoSJTU/LibriSQA

    Pengenalan Suku Kata Bahasa Indonesia Menggunakan Metode LPC Dan Backpropagation Neural Network

    Get PDF
    Suara menjadi komponen terpenting dalam perkembangan teknologi digital saat ini, untuk mempermudah kehidupan manusia. Berbagai sistem pengenalan suara atau Automatic Speech Recognation (ASR) telah banyak dikembangkan di berbagai negara dengan berbagai bahasa. Pengenalan suara dapat diaplikasikan di berbagai bidang kehidupan salah satunya pada sistem keamanan berbasis suara, berupa password. Di Indonesia sendiri banyak penelitian mengenai pengenalan suara menggunakan bahasa Indonesia dengan berbagai metode, tetapi masih dalam jumlah yang terbatas dan hanya berfungsi untuk perintah suatu aplikasi tertentu. Oleh karena itu, pada penelitian ini, penulis melakukan pengenalan suara berdasarkan suku kata bahasa Indonesia karena bahasa Indonesia sendiri memiliki suku kata yang terbilang banyak dibandingkan dengan suku kata bahasa asing lainnya. Sistem ini terdiri dari 4 proses yaitu proses perekaman suara, proses pre-processing, proses ekstraksi ciri menggunakan metode Linier Predictive Code (LPC), dan proses klasifikasi suara menggunakan metode Backpropagation Neural Network. Terdapat 115 suku kata dan 74 suku kata yang berbeda dari 50 kata bahasa Indonesia yang diucapkan. Total suku kata bahasa Indonesia yang digunakan berjumlah 690 suku kata dari 6 responden. Hasil akurasi pada sistem pengenalan suku kata bahasa Indonesia yaitu 100% mampu mengenali 74 data pelatihan dari setiap 6 responden dan 115 data pengujian belum dilatih didapatkan akurasi terbaik sebesar 69% dari 6 responden. Berdasarkan hasil pengujian yang telah dilakukan, semakin banyak data pelatihan yang diproses dalam jaringan maka semakin tinggi akurasi keberhasilan yang diperoleh (Sinyal suara dapat dikenali)

    RecycleGPT: An Autoregressive Language Model with Recyclable Module

    Full text link
    Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a sequence can be reasonably guessed or inferred based on the preceding ones. Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup while preserving high performance.Comment: Technical Repor

    Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

    Full text link
    Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws

    Secure Transformer Inference

    Get PDF
    We present a three-party protocol that can protect both Transformer parameters and user data during the inference phase. For each feedforward inference process, our protocol only introduces permutation computation of input and output data on the user side. Our protocol, Secure Transformer Inference Protocol (STIP), can be applied to real-world services like ChatGPT
    corecore