Search CORE

828 research outputs found

Banded Householder representation of linear subspaces

Author: Irving Geoffrey
Publication venue
Publication date: 12/09/2011
Field of study

We show how to compactly represent any

n

-dimensional subspace of

R^m

as a banded product of Householder reflections using

n(m - n)

floating point numbers. This is optimal since these subspaces form a Grassmannian space

Gr_n(m)

of dimension

n(m - n)

. The representation is stable and easy to compute: any matrix can be factored into the product of a banded Householder matrix and a square matrix using two to three QR decompositions.Comment: 5 pages, 1 figure, submitted to Linear Algebra and its Application

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

PaMpeR: Proof Method Recommendation System for Isabelle/HOL

Author: Gauthier Thibault
Irving Geoffrey
Loos Sarah M.
Nagashima Yutaka
Nipkow Tobias
Publication venue
Publication date: 19/06/2018
Field of study

Deciding which sub-tool to use for a given proof state requires expertise specific to each ITP. To mitigate this problem, we present PaMpeR, a Proof Method Recommendation system for Isabelle/HOL. Given a proof state, PaMpeR recommends proof methods to discharge the proof goal and provides qualitative explanations as to why it suggests these methods. PaMpeR generates these recommendations based on existing hand-written proof corpora, thus transferring experienced users' expertise to new users. Our evaluation shows that PaMpeR correctly predicts experienced users' proof methods invocation especially when it comes to special purpose proof methods.Comment: An anonymized version of this paper has been submitted to a Computer Science conference in April 201

arXiv.org e-Print Archive

Crossref

Fine-Tuning Language Models via Epistemic Neural Networks

Author: Asghari Seyed Mohammad
Aslanides John
Irving Geoffrey
McAleese Nat
Osband Ian
Van Roy Benjamin
Publication venue
Publication date: 02/11/2022
Field of study

Large language models are now part of a powerful new paradigm in machine learning. These models learn a wide range of capabilities from training on large unsupervised text corpora. In many applications, these capabilities are then fine-tuned through additional training on specialized data to improve performance in that setting. In this paper, we augment these models with an epinet: a small additional network architecture that helps to estimate model uncertainty and form an epistemic neural network (ENN). ENNs are neural networks that can know what they don't know. We show that, using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes

arXiv.org e-Print Archive

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Author: Irving Geoffrey
Kramár János
Lieberum Tom
Mikulik Vladimir
Nanda Neel
Rahtz Matthew
Shah Rohin
Publication venue
Publication date: 24/07/2023
Field of study

\emph{Circuit analysis} is a promising technique for understanding the internal mechanisms of language models. However, existing analyses are done in small models far from the state of the art. To address this, we present a case study of circuit analysis in the 70B Chinchilla model, aiming to test the scalability of circuit analysis. In particular, we study multiple-choice question answering, and investigate Chinchilla's capability to identify the correct answer \emph{label} given knowledge of the correct answer \emph{text}. We find that the existing techniques of logit attribution, attention pattern visualization, and activation patching naturally scale to Chinchilla, allowing us to identify and categorize a small set of `output nodes' (attention heads and MLPs). We further study the `correct letter' category of attention heads aiming to understand the semantics of their features, with mixed results. For normal multiple-choice question answers, we significantly compress the query, key and value subspaces of the head without loss of performance when operating on the answer labels for multiple-choice questions, and we show that the query and key subspaces represent an `Nth item in an enumeration' feature to at least some extent. However, when we attempt to use this explanation to understand the heads' behaviour on a more general distribution including randomized answer labels, we find that it is only a partial explanation, suggesting there is more to learn about the operation of `correct letter' heads on multiple choice question answering

arXiv.org e-Print Archive