6 research outputs found
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
Answering multi-hop reasoning questions requires retrieving and synthesizing
information from diverse sources. Large Language Models (LLMs) struggle to
perform such reasoning consistently. Here we propose an approach to pinpoint
and rectify multi-hop reasoning failures through targeted memory injections on
LLM attention heads. First, we analyze the per-layer activations of GPT-2
models in response to single and multi-hop prompts. We then propose a mechanism
that allows users to inject pertinent prompt-specific information, which we
refer to as "memories," at critical LLM locations during inference. By thus
enabling the LLM to incorporate additional relevant information during
inference, we enhance the quality of multi-hop prompt completions. We show
empirically that a simple, efficient, and targeted memory injection into a key
attention layer can often increase the probability of the desired next token in
multi-hop tasks, by up to 424%
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Transformer-based Large Language Models (LLMs) are the state-of-the-art for
natural language tasks. Recent work has attempted to decode, by reverse
engineering the role of linear layers, the internal mechanisms by which LLMs
arrive at their final predictions for text completion tasks. Yet little is
known about the specific role of attention heads in producing the final token
prediction. We propose Attention Lens, a tool that enables researchers to
translate the outputs of attention heads into vocabulary tokens via learned
attention-head-specific transformations called lenses. Preliminary findings
from our trained lenses indicate that attention heads play highly specialized
roles in language models. The code for Attention Lens is available at
github.com/msakarvadia/AttentionLens
Modeling pulsed laser micromachining of micro geometries using machine-learning techniques
A wide range of opportunities are emerging in the micro-system technology sector for laser micro-machining systems, because they are capable of processing various types of materials with micro-scale precision. However, few process datasets and machine-learning techniques are optimized for this industrial task. This study describes the process parameters of micro-laser milling and their influence on the final features of the microshapes that are produced. It also identifies the most accurate machine-learning technique for the modelization of this multivariable process. It examines the capabilities of laser micro-machining by performing experiments on hardened steel with a pulsed Nd:YAG laser. Arrays of micro-channels were manufactured using various scanning speeds, pulse intensities and pulse frequencies. The results are presented in terms of the main industrial requirements for any manufactured good: dimensional accuracy (in our case, depth and width of the channels), surface roughness and material removal rate (which is a measure of the productivity of the process). Different machine-learning techniques were then tested on the datasets to build highly accurate models for each output variable. The selected techniques were: k-Nearest Neighbours, neural networks, decision trees and linear regression models. Our analysis of the correlation coefficients and the mean absolute error of all the generated models show that neural networks are better at modelling channel depth and that decision trees are better at modelling material removal rate; both techniques were similar for width and surface roughness. In general, these two techniques show better accuracy than the other two models. The work concludes that decision trees should be used, if information on input parameter relations is sought, while neural networks are suitable when the dimensional accuracy of the workpiece is the main industrial requirement. Extensive datasets are necessary for this industrial task, to provide reliable AI models due to the high rates of noise, especially for some outputs such as roughnes