873,379 research outputs found
Structured Like a Language Model: Analysing AI as an Automated Subject
Drawing from the resources of psychoanalysis and critical media studies, in
this paper we develop an analysis of Large Language Models (LLMs) as automated
subjects. We argue the intentional fictional projection of subjectivity onto
LLMs can yield an alternate frame through which AI behaviour, including its
productions of bias and harm, can be analysed. First, we introduce language
models, discuss their significance and risks, and outline our case for
interpreting model design and outputs with support from psychoanalytic
concepts. We trace a brief history of language models, culminating with the
releases, in 2022, of systems that realise state-of-the-art natural language
processing performance. We engage with one such system, OpenAI's InstructGPT,
as a case study, detailing the layers of its construction and conducting
exploratory and semi-structured interviews with chatbots. These interviews
probe the model's moral imperatives to be helpful, truthful and harmless by
design. The model acts, we argue, as the condensation of often competing social
desires, articulated through the internet and harvested into training data,
which must then be regulated and repressed. This foundational structure can
however be redirected via prompting, so that the model comes to identify with,
and transfer, its commitments to the immediate human subject before it. In
turn, these automated productions of language can lead to the human subject
projecting agency upon the model, effecting occasionally further forms of
countertransference. We conclude that critical media methods and psychoanalytic
theory together offer a productive frame for grasping the powerful new
capacities of AI-driven language systems
Combining semantic and syntactic structure for language modeling
Structured language models for speech recognition have been shown to remedy
the weaknesses of n-gram models. All current structured language models are,
however, limited in that they do not take into account dependencies between
non-headwords. We show that non-headword dependencies contribute to
significantly improved word error rate, and that a data-oriented parsing model
trained on semantically and syntactically annotated data can exploit these
dependencies. This paper also contains the first DOP model trained by means of
a maximum likelihood reestimation procedure, which solves some of the
theoretical shortcomings of previous DOP models.Comment: 4 page
Structured Sequence Modeling with Graph Convolutional Recurrent Networks
This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep
learning model able to predict structured sequences of data. Precisely, GCRN is
a generalization of classical recurrent neural networks (RNN) to data
structured by an arbitrary graph. Such structured sequences can represent
series of frames in videos, spatio-temporal measurements on a network of
sensors, or random walks on a vocabulary graph for natural language modeling.
The proposed model combines convolutional neural networks (CNN) on graphs to
identify spatial structures and RNN to find dynamic patterns. We study two
possible architectures of GCRN, and apply the models to two practical problems:
predicting moving MNIST data, and modeling natural language with the Penn
Treebank dataset. Experiments show that exploiting simultaneously graph spatial
and dynamic information about data can improve both precision and learning
speed
Towards a Unified Framework for Declarative Structured Communications
We present a unified framework for the declarative analysis of structured
communications. By relying on a (timed) concurrent constraint programming
language, we show that in addition to the usual operational techniques from
process calculi, the analysis of structured communications can elegantly
exploit logic-based reasoning techniques. We introduce a declarative
interpretation of the language for structured communications proposed by Honda,
Vasconcelos, and Kubo. Distinguishing features of our approach are: the
possibility of including partial information (constraints) in the session
model; the use of explicit time for reasoning about session duration and
expiration; a tight correspondence with logic, which formally relates session
execution and linear-time temporal logic formulas
- …