43 research outputs found

    End-to-End Attention-based Large Vocabulary Speech Recognition

    Full text link
    Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence decoding. We investigate a more direct approach in which the HMM is replaced with a Recurrent Neural Network (RNN) that performs sequence prediction directly at the character level. Alignment between the input features and the desired character sequence is learned automatically by an attention mechanism built into the RNN. For each predicted character, the attention mechanism scans the input sequence and chooses relevant frames. We propose two methods to speed up this operation: limiting the scan to a subset of most promising frames and pooling over time the information contained in neighboring frames, thereby reducing source sequence length. Integrating an n-gram language model into the decoding process yields recognition accuracies similar to other HMM-free RNN-based approaches

    Exposing Attention Glitches with Flip-Flop Language Modeling

    Full text link
    Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of attention glitches, in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning. To isolate the issue, we introduce flip-flop language modeling (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.Comment: v2: NeurIPS 2023 camera-ready + data releas

    Interpreting multi-stable behaviour in input-driven recurrent neural networks

    Get PDF
    Recurrent neural networks (RNNs) are computational models inspired by the brain. Although RNNs stand out as state-of-the-art machine learning models to solve challenging tasks as speech recognition, handwriting recognition, language translation, and others, they are plagued by the so-called vanishing/exploding gradient issue. This prevents us from training RNNs with the aim of learning long term dependencies in sequential data. Moreover, a problem of interpretability affects these models, known as the ``black-box issue'' of RNNs. We attempt to open the black box by developing a mechanistic interpretation of errors occurring during the computation. We do this from a dynamical system theory perspective, specifically building on the notion of Excitable Network Attractors. Our methodology is effective at least for those tasks where a number of attractors and a switching pattern between them must be learned. RNNs can be seen as massively large nonlinear dynamical systems driven by external inputs. When it comes to analytically investigate RNNs, often in the literature the input-driven property is neglected or dropped in favour of tight constraints on the input driving the dynamics, which do not match the reality of RNN applications. Trying to bridge this gap, we framed RNNs dynamics driven by generic input sequences in the context of nonautonomous dynamical system theory. This brought us to enquire deeply into a fundamental principle established for RNNs known as the echo state property (ESP). In particular, we argue that input-driven RNNs can be reliable computational models even without satisfying the classical ESP formulation. We prove a sort of input-driven fixed point theorem and exploit it to (i) demonstrate the existence and uniqueness of a global attracting solution for strongly (in amplitude) input-driven RNNs, (ii) deduce the existence of multiple responses for certain input signals which can be reliably exploited for computational purposes, and (iii) study the stability of attracting solutions w.r.t. input sequences. Finally, we highlight the active role of the input in determining qualitative changes in the RNN dynamics, e.g. the number of stable responses, in contrast to commonly known qualitative changes due to variations of model parameters
    corecore