826 research outputs found

    Banded Householder representation of linear subspaces

    Get PDF
    We show how to compactly represent any nn-dimensional subspace of RmR^m as a banded product of Householder reflections using n(m−n)n(m - n) floating point numbers. This is optimal since these subspaces form a Grassmannian space Grn(m)Gr_n(m) of dimension n(m−n)n(m - n). The representation is stable and easy to compute: any matrix can be factored into the product of a banded Householder matrix and a square matrix using two to three QR decompositions.Comment: 5 pages, 1 figure, submitted to Linear Algebra and its Application

    PaMpeR: Proof Method Recommendation System for Isabelle/HOL

    Full text link
    Deciding which sub-tool to use for a given proof state requires expertise specific to each ITP. To mitigate this problem, we present PaMpeR, a Proof Method Recommendation system for Isabelle/HOL. Given a proof state, PaMpeR recommends proof methods to discharge the proof goal and provides qualitative explanations as to why it suggests these methods. PaMpeR generates these recommendations based on existing hand-written proof corpora, thus transferring experienced users' expertise to new users. Our evaluation shows that PaMpeR correctly predicts experienced users' proof methods invocation especially when it comes to special purpose proof methods.Comment: An anonymized version of this paper has been submitted to a Computer Science conference in April 201

    Fine-Tuning Language Models via Epistemic Neural Networks

    Full text link
    Large language models are now part of a powerful new paradigm in machine learning. These models learn a wide range of capabilities from training on large unsupervised text corpora. In many applications, these capabilities are then fine-tuned through additional training on specialized data to improve performance in that setting. In this paper, we augment these models with an epinet: a small additional network architecture that helps to estimate model uncertainty and form an epistemic neural network (ENN). ENNs are neural networks that can know what they don't know. We show that, using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes

    Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

    Full text link
    \emph{Circuit analysis} is a promising technique for understanding the internal mechanisms of language models. However, existing analyses are done in small models far from the state of the art. To address this, we present a case study of circuit analysis in the 70B Chinchilla model, aiming to test the scalability of circuit analysis. In particular, we study multiple-choice question answering, and investigate Chinchilla's capability to identify the correct answer \emph{label} given knowledge of the correct answer \emph{text}. We find that the existing techniques of logit attribution, attention pattern visualization, and activation patching naturally scale to Chinchilla, allowing us to identify and categorize a small set of `output nodes' (attention heads and MLPs). We further study the `correct letter' category of attention heads aiming to understand the semantics of their features, with mixed results. For normal multiple-choice question answers, we significantly compress the query, key and value subspaces of the head without loss of performance when operating on the answer labels for multiple-choice questions, and we show that the query and key subspaces represent an `Nth item in an enumeration' feature to at least some extent. However, when we attempt to use this explanation to understand the heads' behaviour on a more general distribution including randomized answer labels, we find that it is only a partial explanation, suggesting there is more to learn about the operation of `correct letter' heads on multiple choice question answering
    • …
    corecore