3 research outputs found
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Attention-based models have demonstrated remarkable success in various
natural language understanding tasks. However, efficient execution remains a
challenge for these models which are memory-bound due to their massive number
of parameters. We present GOBO, a model quantization technique that compresses
the vast majority (typically 99.9%) of the 32-bit floating-point parameters of
state-of-the-art BERT models and their variants to 3 bits while maintaining
their accuracy. Unlike other quantization methods, GOBO does not require
fine-tuning nor retraining to compensate for the quantization error. We present
two practical hardware applications of GOBO. In the first GOBO reduces memory
storage and traffic and as a result inference latency and energy consumption.
This GOBO memory compression mechanism is plug-in compatible with many
architectures; we demonstrate it with the TPU, Eyeriss, and an architecture
using Tensor Cores-like units. Second, we present a co-designed hardware
architecture that also reduces computation. Uniquely, the GOBO architecture
maintains most of the weights in 3b even during computation, a property that:
(1) makes the processing elements area efficient, allowing us to pack more
compute power per unit area, (2) replaces most multiply-accumulations with
additions, and (3) reduces the off-chip traffic by amplifying on-chip memory
capacity.Comment: Accepted at the 53rd IEEE/ACM International Symposium on
Microarchitecture - MICRO 202
The Go Transformer: Natural Language Modeling for Game Play
This work applies natural language modeling to generate plausible strategic
moves in the ancient game of Go. We train the Generative Pretrained Transformer
(GPT-2) to mimic the style of Go champions as archived in Smart Game Format
(SGF), which offers a text description of move sequences. The trained model
further generates valid but previously unseen strategies for Go. Because GPT-2
preserves punctuation and spacing, the raw output of the text generator
provides inputs to game visualization and creative patterns, such as the Sabaki
project's game engine using auto-replays. Results demonstrate that language
modeling can capture both the sequencing format of championship Go games and
their strategic formations. Compared to random game boards, the GPT-2
fine-tuning shows efficient opening move sequences favoring corner play over
less advantageous center and side play. Game generation as a language modeling
task offers novel approaches to more than 40 other board games where historical
text annotation provides training data (e.g., Amazons & Connect 4/6).Comment: 8 Pages, 5 Figures, 1 Table, IEEE Format, Ai4i 202
The Chess Transformer: Mastering Play using Generative Language Models
This work demonstrates that natural language transformers can support more
generic strategic modeling, particularly for text-archived games. In addition
to learning natural language skills, the abstract transformer architecture can
generate meaningful moves on a chessboard. With further fine-tuning, the
transformer learns complex gameplay by training on 2.8 million chess games in
Portable Game Notation. After 30,000 training steps, OpenAI's Generative
Pre-trained Transformer (GPT-2) optimizes weights for 774 million parameters.
This fine-tuned Chess Transformer generates plausible strategies and displays
game formations identifiable as classic openings, such as English or the Slav
Exchange. Finally, in live play, the novel model demonstrates a
human-to-transformer interface that correctly filters illegal moves and
provides a novel method to challenge the transformer's chess strategies. We
anticipate future work will build on this transformer's promise, particularly
in other strategy games where features can capture the underlying complex rule
syntax from simple but expressive player annotations.Comment: 7 Pages, 6 Figures, AAAI Format, AAAI 2