27 research outputs found

    Localizing Paragraph Memorization in Language Models

    Full text link
    Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the memorized examples can be unlearned by fine-tuning only the high-gradient weights. We localize a low-layer attention head that appears to be especially involved in paragraph memorization. This head is predominantly focusing its attention on distinctive, rare tokens that are least frequent in a corpus-level unigram distribution. Next, we study how localized memorization is across the tokens in the prefix by perturbing tokens and measuring the caused change in the decoding. A few distinctive tokens early in a prefix can often corrupt the entire continuation. Overall, memorized continuations are not only harder to unlearn, but also to corrupt than non-memorized ones

    Unsupervised Contrast-Consistent Ranking with Language Models

    Full text link
    Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank reviews by sentiment. Recent work focuses on pairwise, pointwise, and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probing model guided by a logical constraint: a model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss, and Ordinal Regression objective. Our results confirm that, for the same language model, CCR probing outperforms prompting and even performs on a par with prompting much larger language models

    Generalizing Backpropagation for Gradient-Based Interpretability

    Full text link
    Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.Comment: Long paper accepted at ACL 202

    An Ordinal Latent Variable Model of Conflict Intensity

    Full text link
    For the quantitative monitoring of international relations, political events are extracted from the news and parsed into "who-did-what-to-whom" patterns. This has resulted in large data collections which require aggregate statistics for analysis. The Goldstein Scale is an expert-based measure that ranks individual events on a one-dimensional scale from conflictual to cooperative. However, the scale disregards fatality counts as well as perpetrator and victim types involved in an event. This information is typically considered in qualitative conflict assessment. To address this limitation, we propose a probabilistic generative model over the full subject-predicate-quantifier-object tuples associated with an event. We treat conflict intensity as an interpretable, ordinal latent variable that correlates conflictual event types with high fatality counts. Taking a Bayesian approach, we learn a conflict intensity scale from data and find the optimal number of intensity classes. We evaluate the model by imputing missing data. Our scale proves to be more informative than the original Goldstein Scale in autoregressive forecasting and when compared with global online attention towards armed conflicts

    Estimating conflict losses and reporting biases

    Get PDF
    Determining the number of casualties and fatalities suffered in militarized conflicts is important for conflict measurement, forecasting, and accountability. However, given the nature of conflict, reliable statistics on casualties are rare. Countries or political actors involved in conflicts have incentives to hide or manipulate these numbers, while third parties might not have access to reliable information. For example, in the ongoing militarized conflict between Russia and Ukraine, estimates of the magnitude of losses vary wildly, sometimes across orders of magnitude. In this paper, we offer an approach for measuring casualties and fatalities given multiple reporting sources and, at the same time, accounting for the biases of those sources. We construct a dataset of 4,609 reports of military and civilian losses by both sides. We then develop a statistical model to better estimate losses for both sides given these reports. Our model accounts for different kinds of reporting biases, structural correlations between loss types, and integrates loss reports at different temporal scales. Our daily and cumulative estimates provide evidence that Russia has lost more personnel than has Ukraine and also likely suffers from a higher fatality to casualty ratio. We find that both sides likely overestimate the personnel losses suffered by their opponent and that Russian sources underestimate their own losses of personnel

    The Code of Protest. Images of Peace in the West German Peace Movements, 1945-1990

    Get PDF
    The article examines posters produced by the peace movements in the Federal Republic of Germany during the ColdWar, with an analytical focus on the transformation of the iconography of peace in modernity. Was it possible to develop an independent, positive depiction of peace in the context of protests for peace and disarmament? Despite its name, the pictorial selfrepresentation of the campaign ‘Fight against Nuclear Death’ in the late 1950s did not draw on the theme of pending nuclear mass death. The large-scale protest movement in the 1980s against NATO’s 1979 ‘double-track’ decision contrasted female peacefulness with masculine aggression in an emotionally charged pictorial symbolism. At the same time this symbolism marked a break with the pacifist iconographic tradition that had focused on the victims of war. Instead, the movement presented itself with images of demonstrating crowds, as an anticipation of its peaceful ends. Drawing on the concept of asymmetrical communicative ‘codes’ that has been developed in sociological systems theory, the article argues that the iconography of peace in peace movement posters could not develop a genuinely positive vision of peace, since the code of protest can articulate the designation value ‘peace’ only in conjunction with the rejection value ‘war’

    Extended Multilingual Protest News Detection -- Shared Task 1, CASE 2021 and 2022

    Get PDF
    We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese \& Subtask 4 English.Comment: To appear in CASE 2022 @ EMNLP 202

    SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

    Get PDF
    This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe
    corecore