499 research outputs found
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Despite the growing use of transformer models in computer vision, a
mechanistic understanding of these networks is still needed. This work
introduces a method to reverse-engineer Vision Transformers trained to solve
image classification tasks. Inspired by previous research in NLP, we
demonstrate how the inner representations at any level of the hierarchy can be
projected onto the learned class embedding space to uncover how these networks
build categorical representations for their predictions. We use our framework
to show how image tokens develop class-specific representations that depend on
attention mechanisms and contextual information, and give insights on how
self-attention and MLP layers differentially contribute to this categorical
composition. We additionally demonstrate that this method (1) can be used to
determine the parts of an image that would be important for detecting the class
of interest, and (2) exhibits significant advantages over traditional linear
probing approaches. Taken together, our results position our proposed framework
as a powerful tool for mechanistic interpretability and explainability
research.Comment: NeurIPS 202
System identification of neural systems: If we got it right, would we know?
Artificial neural networks are being proposed as models of parts of the
brain. The networks are compared to recordings of biological neurons, and good
performance in reproducing neural responses is considered to support the
model's validity. A key question is how much this system identification
approach tells us about brain computation. Does it validate one model
architecture over another? We evaluate the most commonly used comparison
techniques, such as a linear encoding model and centered kernel alignment, to
correctly identify a model by replacing brain recordings with known ground
truth models. System identification performance is quite variable; it also
depends significantly on factors independent of the ground truth architecture,
such as stimuli images. In addition, we show the limitations of using
functional similarity scores in identifying higher-level architectural motifs
Quantifying Attention Flow in Transformers
In the Transformer model, "self-attention" combines information from attended
embeddings into the representation of the focal embedding in the next layer.
Thus, across layers of the Transformer, information originating from different
tokens gets increasingly mixed. This makes attention weights unreliable as
explanations probes. In this paper, we consider the problem of quantifying this
flow of information through self-attention. We propose two methods for
approximating the attention to input tokens given attention weights, attention
rollout and attention flow, as post hoc methods when we use attention weights
as the relative relevance of the input tokens. We show that these methods give
complementary views on the flow of information, and compared to raw attention,
both yield higher correlations with importance scores of input tokens obtained
using an ablation method and input gradients
Discovering Predictable Latent Factors for Time Series Forecasting
Modern time series forecasting methods, such as Transformer and its variants,
have shown strong ability in sequential data modeling. To achieve high
performance, they usually rely on redundant or unexplainable structures to
model complex relations between variables and tune the parameters with
large-scale data. Many real-world data mining tasks, however, lack sufficient
variables for relation reasoning, and therefore these methods may not properly
handle such forecasting problems. With insufficient data, time series appear to
be affected by many exogenous variables, and thus, the modeling becomes
unstable and unpredictable. To tackle this critical issue, in this paper, we
develop a novel algorithmic framework for inferring the intrinsic latent
factors implied by the observable time series. The inferred factors are used to
form multiple independent and predictable signal components that enable not
only sparse relation reasoning for long-term efficiency but also reconstructing
the future temporal data for accurate prediction. To achieve this, we introduce
three characteristics, i.e., predictability, sufficiency, and identifiability,
and model these characteristics via the powerful deep latent dynamics models to
infer the predictable signal components. Empirical results on multiple real
datasets show the efficiency of our method for different kinds of time series
forecasting. The statistical analysis validates the predictability of the
learned latent factors
- …