Search CORE

8 research outputs found

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

Author: Bau Anthony
Belinkov Yonatan
Dalvi Fahim
Durrani Nadir
Glass James
Sajjad Hassan
Publication venue
Publication date: 21/12/2018
Field of study

Despite the remarkable evolution of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. Previous work largely focused on what these models learn at the representation level. We break this analysis down further and study individual dimensions (neurons) in the vector representation learned by end-to-end neural models in NLP tasks. We propose two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation Analysis, an unsupervised method to extract salient neurons w.r.t. the model itself. We evaluate the effectiveness of our techniques by ablating the identified neurons and reevaluating the network's performance for two tasks: neural machine translation (NMT) and neural language modeling (NLM). We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models? ii) are certain neurons exclusive to some properties and not others? iii) is the information more or less distributed in NMT vs. NLM? and iv) how important are the neurons identified through the linguistic correlation method to the overall task? Our code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019).Comment: AAA 2019, pages 10, AAAI Conference on Artificial Intelligence (AAAI 2019

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Measuring Memorization Effect in Word-Level Neural Networks Probing

Author: Mareček David
Musil Tomáš
Rosa Rudolf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/06/2020
Field of study

Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for individual words, instead of extracting the linguistic abstractions from the representations, thus reporting false positive results. While considerable efforts have been made to minimize the memorization problem, the task of actually measuring the amount of memorization happening in the classifier has been understudied so far. In our work, we propose a simple general method for measuring the memorization effect, based on a symmetric selection of comparable sets of test words seen versus unseen in training. Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate. We exemplify this by showcasing our method on a case study of probing for part of speech in a trained neural machine translation encoder.Comment: Accepted to TSD 2020. Will be published in Springer LNC

arXiv.org e-Print Archive

Crossref

The Golden Rule as a Heuristic to Measure the Fairness of Texts Using Machine Learning

Author: Izzidien Ahmed
Stillwell David
Publication venue
Publication date: 04/11/2022
Field of study

In this paper we present a natural language programming framework to consider how the fairness of acts can be measured. For the purposes of the paper, a fair act is defined as one that one would be accepting of if it were done to oneself. The approach is based on an implementation of the golden rule (GR) in the digital domain. Despite the GRs prevalence as an axiom throughout history, no transfer of this moral philosophy into computational systems exists. In this paper we consider how to algorithmically operationalise this rule so that it may be used to measure sentences such as: the boy harmed the girl, and categorise them as fair or unfair. A review and reply to criticisms of the GR is made. A suggestion of how the technology may be implemented to avoid unfair biases in word embeddings is made - given that individuals would typically not wish to be on the receiving end of an unfair act, such as racism, irrespective of whether the corpus being used deems such discrimination as praiseworthy

arXiv.org e-Print Archive