85 research outputs found
ContraSim -- A Similarity Measure Based on Contrastive Learning
Recent work has compared neural network representations via similarity-based
analyses to improve model interpretation. The quality of a similarity measure
is typically evaluated by its success in assigning a high score to
representations that are expected to be matched. However, existing similarity
measures perform mediocrely on standard benchmarks. In this work, we develop a
new similarity measure, dubbed ContraSim, based on contrastive learning. In
contrast to common closed-form similarity measures, ContraSim learns a
parameterized measure by using both similar and dissimilar examples. We perform
an extensive experimental evaluation of our method, with both language and
vision models, on the standard layer prediction benchmark and two new
benchmarks that we introduce: the multilingual benchmark and the image-caption
benchmark. In all cases, ContraSim achieves much higher accuracy than previous
similarity measures, even when presented with challenging examples. Finally,
ContraSim is more suitable for the analysis of neural networks, revealing new
insights not captured by previous measures
Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Recent advances in interpretability suggest we can project weights and hidden
states of transformer-based language models (LMs) to their vocabulary, a
transformation that makes them human interpretable and enables us to assign
semantics to what was seen only as numerical vectors. In this paper, we
interpret LM attention heads and memory values, the vectors the models
dynamically create and recall while processing a given input. By analyzing the
tokens they represent through this projection, we identify patterns in the
information flow inside the attention mechanism. Based on these discoveries, we
create a tool to visualize a forward pass of Generative Pre-trained
Transformers (GPTs) as an interactive flow graph, with nodes representing
neurons or hidden states and edges representing the interactions between them.
Our visualization simplifies huge amounts of data into easy-to-read plots that
reflect why models output their results. We demonstrate the utility of our
modeling by identifying the effect LM components have on the intermediate
processing in the model before outputting a prediction. For instance, we
discover that layer norms are used as semantic filters and find neurons that
act as regularization vectors
When Language Models Fall in Love: Animacy Processing in Transformer Language Models
Animacy - whether an entity is alive and sentient - is fundamental to
cognitive processing, impacting areas such as memory, vision, and language.
However, animacy is not always expressed directly in language: in English it
often manifests indirectly, in the form of selectional constraints on verbs and
adjectives. This poses a potential issue for transformer language models (LMs):
they often train only on text, and thus lack access to extralinguistic
information from which humans learn about animacy. We ask: how does this impact
LMs' animacy processing - do they still behave as humans do? We answer this
question using open-source LMs. Like previous studies, we find that LMs behave
much like humans when presented with entities whose animacy is typical.
However, we also show that even when presented with stories about atypically
animate entities, such as a peanut in love, LMs adapt: they treat these
entities as animate, though they do not adapt as well as humans. Even when the
context indicating atypical animacy is very short, LMs pick up on subtle clues
and change their behavior. We conclude that despite the limited signal through
which LMs can learn about animacy, they are indeed sensitive to the relevant
lexical semantic nuances available in English.Comment: To appear at EMNLP 202
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
We propose a process for investigating the extent to which sentence
representations arising from neural machine translation (NMT) systems encode
distinct semantic phenomena. We use these representations as features to train
a natural language inference (NLI) classifier based on datasets recast from
existing semantic annotations. In applying this process to a representative NMT
system, we find its encoder appears most suited to supporting inferences at the
syntax-semantics interface, as compared to anaphora resolution requiring
world-knowledge. We conclude with a discussion on the merits and potential
deficiencies of the existing process, and how it may be improved and extended
as a broader framework for evaluating semantic coverage.Comment: To be presented at NAACL 2018 - 11 page
- …