41 research outputs found
Root Mean Square Layer Normalization
Layer normalization (LayerNorm) has been successfully applied to various deep
neural networks to help stabilize training and boost model convergence because
of its capability in handling re-centering and re-scaling of both inputs and
weight matrix. However, the computational overhead introduced by LayerNorm
makes these improvements expensive and significantly slows the underlying
network, e.g. RNN in particular. In this paper, we hypothesize that
re-centering invariance in LayerNorm is dispensable and propose root mean
square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs
to a neuron in one layer according to root mean square (RMS), giving the model
re-scaling invariance property and implicit learning rate adaptation ability.
RMSNorm is computationally simpler and thus more efficient than LayerNorm. We
also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of
the summed inputs without breaking the above properties. Extensive experiments
on several tasks using diverse network architectures show that RMSNorm achieves
comparable performance against LayerNorm but reduces the running time by 7%~64%
on different models. Source code is available at
https://github.com/bzhangGo/rmsnorm.Comment: NeurIPS 201
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Recent research, such as BitNet, is paving the way for a new era of 1-bit
Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant,
namely BitNet b1.58, in which every single parameter (or weight) of the LLM is
ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16)
Transformer LLM with the same model size and training tokens in terms of both
perplexity and end-task performance, while being significantly more
cost-effective in terms of latency, memory, throughput, and energy consumption.
More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for
training new generations of LLMs that are both high-performance and
cost-effective. Furthermore, it enables a new computation paradigm and opens
the door for designing specific hardware optimized for 1-bit LLMs.Comment: Work in progres
NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services
Large language models (LLMs) have triggered tremendous success to empower
daily life by generative information, and the personalization of LLMs could
further contribute to their applications due to better alignment with human
intents. Towards personalized generative services, a collaborative cloud-edge
methodology sounds promising, as it facilitates the effective orchestration of
heterogeneous distributed communication and computing resources. In this
article, after discussing the pros and cons of several candidate cloud-edge
collaboration techniques, we put forward NetGPT to capably deploy appropriate
LLMs at the edge and the cloud in accordance with their computing capacity. In
addition, edge LLMs could efficiently leverage location-based information for
personalized prompt completion, thus benefiting the interaction with cloud
LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and
LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on
the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently,
we highlight substantial essential changes required for a native artificial
intelligence (AI) network architecture towards NetGPT, with special emphasis on
deeper integration of communications and computing resources and careful
calibration of logical AI workflow. Furthermore, we demonstrate several
by-product benefits of NetGPT, given edge LLM's astonishing capability to
predict trends and infer intents, which possibly leads to a unified solution
for intelligent network management \& orchestration. In a nutshell, we argue
that NetGPT is a promising native-AI network architecture beyond provisioning
personalized generative services
A Meta-Learning Perspective on Transformers for Causal Language Modeling
The Transformer architecture has become prominent in developing large causal
language models. However, mechanisms to explain its capabilities are not well
understood. Focused on the training process, here we establish a meta-learning
view of the Transformer architecture when trained for the causal language
modeling task, by explicating an inner optimization process that may happen
within the Transformer. Further, from within the inner optimization, we
discover and theoretically analyze a special characteristic of the norms of
learned token representations within Transformer-based causal language models.
Our analysis is supported by experiments conducted on pre-trained large
language models and real-world data
Functional Invariants to Watermark Large Transformers
The rapid growth of transformer-based models increases the concerns about
their integrity and ownership insurance. Watermarking addresses this issue by
embedding a unique identifier into the model, while preserving its performance.
However, most existing approaches require to optimize the weights to imprint
the watermark signal, which is not suitable at scale due to the computational
cost. This paper explores watermarks with virtually no computational cost,
applicable to a non-blind white-box setting (assuming access to both the
original and watermarked networks). They generate functionally equivalent
copies by leveraging the models' invariance, via operations like dimension
permutations or scaling/unscaling. This enables to watermark models without any
change in their outputs and remains stealthy. Experiments demonstrate the
effectiveness of the approach and its robustness against various model
transformations (fine-tuning, quantization, pruning), making it a practical
solution to protect the integrity of large models.Comment: Published at ICASSP 2024. Webpage at
https://pierrefdz.github.io/publications/invariancewm
LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework
While Large Language Models (LLMs) have demonstrated commendable performance
across a myriad of domains and tasks, existing LLMs still exhibit a palpable
deficit in handling multimodal functionalities, especially for the Spoken
Question Answering (SQA) task which necessitates precise alignment and deep
interaction between speech and text features. To address the SQA challenge on
LLMs, we initially curated the free-form and open-ended LibriSQA dataset from
Librispeech, comprising Part I with natural conversational formats and Part II
encompassing multiple-choice questions followed by answers and analytical
segments. Both parts collectively include 107k SQA pairs that cover various
topics. Given the evident paucity of existing speech-text LLMs, we propose a
lightweight, end-to-end framework to execute the SQA task on the LibriSQA,
witnessing significant results. By reforming ASR into the SQA format, we
further substantiate our framework's capability in handling ASR tasks. Our
empirical findings bolster the LLMs' aptitude for aligning and comprehending
multimodal information, paving the way for the development of universal
multimodal LLMs. The dataset and demo can be found at
https://github.com/ZihanZhaoSJTU/LibriSQA
Pengenalan Suku Kata Bahasa Indonesia Menggunakan Metode LPC Dan Backpropagation Neural Network
Suara menjadi komponen terpenting dalam perkembangan teknologi digital saat ini, untuk mempermudah kehidupan manusia. Berbagai sistem pengenalan suara atau Automatic Speech Recognation (ASR) telah banyak dikembangkan di berbagai negara dengan berbagai bahasa. Pengenalan suara dapat diaplikasikan di berbagai bidang kehidupan salah satunya pada sistem keamanan berbasis suara, berupa password. Di Indonesia sendiri banyak penelitian mengenai pengenalan suara menggunakan bahasa Indonesia dengan berbagai metode, tetapi masih dalam jumlah yang terbatas dan hanya berfungsi untuk perintah suatu aplikasi tertentu. Oleh karena itu, pada penelitian ini, penulis melakukan pengenalan suara berdasarkan suku kata bahasa Indonesia karena bahasa Indonesia sendiri memiliki suku kata yang terbilang banyak dibandingkan dengan suku kata bahasa asing lainnya. Sistem ini terdiri dari 4 proses yaitu proses perekaman suara, proses pre-processing, proses ekstraksi ciri menggunakan metode Linier Predictive Code (LPC), dan proses klasifikasi suara menggunakan metode Backpropagation Neural Network. Terdapat 115 suku kata dan 74 suku kata yang berbeda dari 50 kata bahasa Indonesia yang diucapkan. Total suku kata bahasa Indonesia yang digunakan berjumlah 690 suku kata dari 6 responden. Hasil akurasi pada sistem pengenalan suku kata bahasa Indonesia yaitu 100% mampu mengenali 74 data pelatihan dari setiap 6 responden dan 115 data pengujian belum dilatih didapatkan akurasi terbaik sebesar 69% dari 6 responden. Berdasarkan hasil pengujian yang telah dilakukan, semakin banyak data pelatihan yang diproses dalam jaringan maka semakin tinggi akurasi keberhasilan yang diperoleh (Sinyal suara dapat dikenali)
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Existing large language models have to run K times to generate a sequence of
K tokens. In this paper, we present RecycleGPT, a generative language model
with fast decoding speed by recycling pre-generated model states without
running the whole model in multiple steps. Our approach relies on the
observation that adjacent tokens in a sequence usually have strong correlations
and the next token in a sequence can be reasonably guessed or inferred based on
the preceding ones. Experiments and analysis demonstrate the effectiveness of
our approach in lowering inference latency, achieving up to 1.4x speedup while
preserving high performance.Comment: Technical Repor
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Despite the dominance and effectiveness of scaling, resulting in large
networks with hundreds of billions of parameters, the necessity to train
overparametrized models remains poorly understood, and alternative approaches
do not necessarily make it cheaper to train high-performance models. In this
paper, we explore low-rank training techniques as an alternative approach to
training large neural networks. We introduce a novel method called ReLoRA,
which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to
pre-training transformer language models with up to 350M parameters and
demonstrate comparable performance to regular neural network training.
Furthermore, we observe that the efficiency of ReLoRA increases with model
size, making it a promising approach for training multi-billion-parameter
networks efficiently. Our findings shed light on the potential of low-rank
training techniques and their implications for scaling laws
Secure Transformer Inference
We present a three-party protocol that can protect both Transformer parameters and user data during the inference phase. For each feedforward inference process, our protocol only introduces permutation computation of input and output data on the user side. Our protocol, Secure Transformer Inference Protocol (STIP), can be applied to real-world services like ChatGPT