Linear Latent World Models in Simple Transformers: A Case Study on
  Othello-GPT

Chiu, Jeffery; Hazineh, Dean S.; Zhang, Zechen

Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT

Authors: Jeffery Chiu
Dean S. Hazineh
Zechen Zhang
Publication date: 12 October 2023
Publisher

Abstract

Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.07582

Last time updated on 16/01/2024