6 research outputs found

    Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

    Full text link
    Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%

    Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

    Full text link
    Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention-head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models. The code for Attention Lens is available at github.com/msakarvadia/AttentionLens

    Modeling pulsed laser micromachining of micro geometries using machine-learning techniques

    No full text
    A wide range of opportunities are emerging in the micro-system technology sector for laser micro-machining systems, because they are capable of processing various types of materials with micro-scale precision. However, few process datasets and machine-learning techniques are optimized for this industrial task. This study describes the process parameters of micro-laser milling and their influence on the final features of the microshapes that are produced. It also identifies the most accurate machine-learning technique for the modelization of this multivariable process. It examines the capabilities of laser micro-machining by performing experiments on hardened steel with a pulsed Nd:YAG laser. Arrays of micro-channels were manufactured using various scanning speeds, pulse intensities and pulse frequencies. The results are presented in terms of the main industrial requirements for any manufactured good: dimensional accuracy (in our case, depth and width of the channels), surface roughness and material removal rate (which is a measure of the productivity of the process). Different machine-learning techniques were then tested on the datasets to build highly accurate models for each output variable. The selected techniques were: k-Nearest Neighbours, neural networks, decision trees and linear regression models. Our analysis of the correlation coefficients and the mean absolute error of all the generated models show that neural networks are better at modelling channel depth and that decision trees are better at modelling material removal rate; both techniques were similar for width and surface roughness. In general, these two techniques show better accuracy than the other two models. The work concludes that decision trees should be used, if information on input parameter relations is sought, while neural networks are suitable when the dimensional accuracy of the workpiece is the main industrial requirement. Extensive datasets are necessary for this industrial task, to provide reliable AI models due to the high rates of noise, especially for some outputs such as roughnes
    corecore