828 research outputs found
Banded Householder representation of linear subspaces
We show how to compactly represent any -dimensional subspace of as a
banded product of Householder reflections using floating point
numbers. This is optimal since these subspaces form a Grassmannian space
of dimension . The representation is stable and easy to
compute: any matrix can be factored into the product of a banded Householder
matrix and a square matrix using two to three QR decompositions.Comment: 5 pages, 1 figure, submitted to Linear Algebra and its Application
PaMpeR: Proof Method Recommendation System for Isabelle/HOL
Deciding which sub-tool to use for a given proof state requires expertise
specific to each ITP. To mitigate this problem, we present PaMpeR, a Proof
Method Recommendation system for Isabelle/HOL. Given a proof state, PaMpeR
recommends proof methods to discharge the proof goal and provides qualitative
explanations as to why it suggests these methods. PaMpeR generates these
recommendations based on existing hand-written proof corpora, thus transferring
experienced users' expertise to new users. Our evaluation shows that PaMpeR
correctly predicts experienced users' proof methods invocation especially when
it comes to special purpose proof methods.Comment: An anonymized version of this paper has been submitted to a Computer
Science conference in April 201
Fine-Tuning Language Models via Epistemic Neural Networks
Large language models are now part of a powerful new paradigm in machine
learning. These models learn a wide range of capabilities from training on
large unsupervised text corpora. In many applications, these capabilities are
then fine-tuned through additional training on specialized data to improve
performance in that setting. In this paper, we augment these models with an
epinet: a small additional network architecture that helps to estimate model
uncertainty and form an epistemic neural network (ENN). ENNs are neural
networks that can know what they don't know. We show that, using an epinet to
prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same
performance while using 2x less data. We also investigate performance in
synthetic neural network generative models designed to build understanding. In
each setting, using an epinet outperforms heuristic active learning schemes
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
\emph{Circuit analysis} is a promising technique for understanding the
internal mechanisms of language models. However, existing analyses are done in
small models far from the state of the art. To address this, we present a case
study of circuit analysis in the 70B Chinchilla model, aiming to test the
scalability of circuit analysis. In particular, we study multiple-choice
question answering, and investigate Chinchilla's capability to identify the
correct answer \emph{label} given knowledge of the correct answer \emph{text}.
We find that the existing techniques of logit attribution, attention pattern
visualization, and activation patching naturally scale to Chinchilla, allowing
us to identify and categorize a small set of `output nodes' (attention heads
and MLPs).
We further study the `correct letter' category of attention heads aiming to
understand the semantics of their features, with mixed results. For normal
multiple-choice question answers, we significantly compress the query, key and
value subspaces of the head without loss of performance when operating on the
answer labels for multiple-choice questions, and we show that the query and key
subspaces represent an `Nth item in an enumeration' feature to at least some
extent. However, when we attempt to use this explanation to understand the
heads' behaviour on a more general distribution including randomized answer
labels, we find that it is only a partial explanation, suggesting there is more
to learn about the operation of `correct letter' heads on multiple choice
question answering
- …