27 research outputs found
Localizing Paragraph Memorization in Language Models
Can we localize the weights and mechanisms used by a language model to
memorize and recite entire paragraphs of its training data? In this paper, we
show that while memorization is spread across multiple layers and model
components, gradients of memorized paragraphs have a distinguishable spatial
pattern, being larger in lower model layers than gradients of non-memorized
examples. Moreover, the memorized examples can be unlearned by fine-tuning only
the high-gradient weights. We localize a low-layer attention head that appears
to be especially involved in paragraph memorization. This head is predominantly
focusing its attention on distinctive, rare tokens that are least frequent in a
corpus-level unigram distribution. Next, we study how localized memorization is
across the tokens in the prefix by perturbing tokens and measuring the caused
change in the decoding. A few distinctive tokens early in a prefix can often
corrupt the entire continuation. Overall, memorized continuations are not only
harder to unlearn, but also to corrupt than non-memorized ones
Unsupervised Contrast-Consistent Ranking with Language Models
Language models contain ranking-based knowledge and are powerful solvers of
in-context ranking tasks. For instance, they may have parametric knowledge
about the ordering of countries by size or may be able to rank reviews by
sentiment. Recent work focuses on pairwise, pointwise, and listwise prompting
techniques to elicit a language model's ranking knowledge. However, we find
that even with careful calibration and constrained decoding, prompting-based
techniques may not always be self-consistent in the rankings they produce. This
motivates us to explore an alternative approach that is inspired by an
unsupervised probing method called Contrast-Consistent Search (CCS). The idea
is to train a probing model guided by a logical constraint: a model's
representation of a statement and its negation must be mapped to contrastive
true-false poles consistently across multiple statements. We hypothesize that
similar constraints apply to ranking tasks where all items are related via
consistent pairwise or listwise comparisons. To this end, we extend the binary
CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking
methods such as the Max-Margin Loss, Triplet Loss, and Ordinal Regression
objective. Our results confirm that, for the same language model, CCR probing
outperforms prompting and even performs on a par with prompting much larger
language models
Generalizing Backpropagation for Gradient-Based Interpretability
Many popular feature-attribution methods for interpreting deep neural
networks rely on computing the gradients of a model's output with respect to
its inputs. While these methods can indicate which input features may be
important for the model's prediction, they reveal little about the inner
workings of the model itself. In this paper, we observe that the gradient
computation of a model is a special case of a more general formulation using
semirings. This observation allows us to generalize the backpropagation
algorithm to efficiently compute other interpretable statistics about the
gradient graph of a neural network, such as the highest-weighted path and
entropy. We implement this generalized algorithm, evaluate it on synthetic
datasets to better understand the statistics it computes, and apply it to study
BERT's behavior on the subject-verb number agreement task (SVA). With this
method, we (a) validate that the amount of gradient flow through a component of
a model reflects its importance to a prediction and (b) for SVA, identify which
pathways of the self-attention mechanism are most important.Comment: Long paper accepted at ACL 202
An Ordinal Latent Variable Model of Conflict Intensity
For the quantitative monitoring of international relations, political events
are extracted from the news and parsed into "who-did-what-to-whom" patterns.
This has resulted in large data collections which require aggregate statistics
for analysis. The Goldstein Scale is an expert-based measure that ranks
individual events on a one-dimensional scale from conflictual to cooperative.
However, the scale disregards fatality counts as well as perpetrator and victim
types involved in an event. This information is typically considered in
qualitative conflict assessment. To address this limitation, we propose a
probabilistic generative model over the full
subject-predicate-quantifier-object tuples associated with an event. We treat
conflict intensity as an interpretable, ordinal latent variable that correlates
conflictual event types with high fatality counts. Taking a Bayesian approach,
we learn a conflict intensity scale from data and find the optimal number of
intensity classes. We evaluate the model by imputing missing data. Our scale
proves to be more informative than the original Goldstein Scale in
autoregressive forecasting and when compared with global online attention
towards armed conflicts
Estimating conflict losses and reporting biases
Determining the number of casualties and fatalities suffered in militarized conflicts is important for conflict measurement, forecasting, and accountability. However, given the nature of conflict, reliable statistics on casualties are rare. Countries or political actors involved in conflicts have incentives to hide or manipulate these numbers, while third parties might not have access to reliable information. For example, in the ongoing militarized conflict between Russia and Ukraine, estimates of the magnitude of losses vary wildly, sometimes across orders of magnitude. In this paper, we offer an approach for measuring casualties and fatalities given multiple reporting sources and, at the same time, accounting for the biases of those sources. We construct a dataset of 4,609 reports of military and civilian losses by both sides. We then develop a statistical model to better estimate losses for both sides given these reports. Our model accounts for different kinds of reporting biases, structural correlations between loss types, and integrates loss reports at different temporal scales. Our daily and cumulative estimates provide evidence that Russia has lost more personnel than has Ukraine and also likely suffers from a higher fatality to casualty ratio. We find that both sides likely overestimate the personnel losses suffered by their opponent and that Russian sources underestimate their own losses of personnel
The Code of Protest. Images of Peace in the West German Peace Movements, 1945-1990
The article examines posters produced by the peace movements in the Federal Republic of
Germany during the ColdWar, with an analytical focus on the transformation of the iconography
of peace in modernity. Was it possible to develop an independent, positive depiction of peace
in the context of protests for peace and disarmament? Despite its name, the pictorial selfrepresentation
of the campaign ‘Fight against Nuclear Death’ in the late 1950s did not draw
on the theme of pending nuclear mass death. The large-scale protest movement in the 1980s
against NATO’s 1979 ‘double-track’ decision contrasted female peacefulness with masculine
aggression in an emotionally charged pictorial symbolism. At the same time this symbolism
marked a break with the pacifist iconographic tradition that had focused on the victims of war.
Instead, the movement presented itself with images of demonstrating crowds, as an anticipation
of its peaceful ends. Drawing on the concept of asymmetrical communicative ‘codes’ that has
been developed in sociological systems theory, the article argues that the iconography of peace in
peace movement posters could not develop a genuinely positive vision of peace, since the code of
protest can articulate the designation value ‘peace’ only in conjunction with the rejection value
‘war’
Extended Multilingual Protest News Detection -- Shared Task 1, CASE 2021 and 2022
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest
Event Detection. This task is a continuation of CASE 2021 that consists of four
subtasks that are i) document classification, ii) sentence classification, iii)
event sentence coreference identification, and iv) event extraction. The CASE
2022 extension consists of expanding the test data with more data in previously
available languages, namely, English, Hindi, Portuguese, and Spanish, and
adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document
classification. The training data from CASE 2021 in English, Portuguese and
Spanish were utilized. Therefore, predicting document labels in Hindi,
Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022
workshop accepts reports on systems developed for predicting test data of CASE
2021 as well. We observe that the best systems submitted by CASE 2022
participants achieve between 79.71 and 84.06 F1-macro for new languages in a
zero-shot setting. The winning approaches are mainly ensembling models and
merging data in multiple languages. The best two submissions on CASE 2021 data
outperform submissions from last year for Subtask 1 and Subtask 2 in all
languages. Only the following scenarios were not outperformed by new
submissions on CASE 2021: Subtask 3 Portuguese \& Subtask 4 English.Comment: To appear in CASE 2022 @ EMNLP 202
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe