8 research outputs found
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Despite the remarkable evolution of deep neural networks in natural language
processing (NLP), their interpretability remains a challenge. Previous work
largely focused on what these models learn at the representation level. We
break this analysis down further and study individual dimensions (neurons) in
the vector representation learned by end-to-end neural models in NLP tasks. We
propose two methods: Linguistic Correlation Analysis, based on a supervised
method to extract the most relevant neurons with respect to an extrinsic task,
and Cross-model Correlation Analysis, an unsupervised method to extract salient
neurons w.r.t. the model itself. We evaluate the effectiveness of our
techniques by ablating the identified neurons and reevaluating the network's
performance for two tasks: neural machine translation (NMT) and neural language
modeling (NLM). We further present a comprehensive analysis of neurons with the
aim to address the following questions: i) how localized or distributed are
different linguistic properties in the models? ii) are certain neurons
exclusive to some properties and not others? iii) is the information more or
less distributed in NMT vs. NLM? and iv) how important are the neurons
identified through the linguistic correlation method to the overall task? Our
code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019).Comment: AAA 2019, pages 10, AAAI Conference on Artificial Intelligence (AAAI
2019
Measuring Memorization Effect in Word-Level Neural Networks Probing
Multiple studies have probed representations emerging in neural networks
trained for end-to-end NLP tasks and examined what word-level linguistic
information may be encoded in the representations. In classical probing, a
classifier is trained on the representations to extract the target linguistic
information. However, there is a threat of the classifier simply memorizing the
linguistic labels for individual words, instead of extracting the linguistic
abstractions from the representations, thus reporting false positive results.
While considerable efforts have been made to minimize the memorization problem,
the task of actually measuring the amount of memorization happening in the
classifier has been understudied so far. In our work, we propose a simple
general method for measuring the memorization effect, based on a symmetric
selection of comparable sets of test words seen versus unseen in training. Our
method can be used to explicitly quantify the amount of memorization happening
in a probing setup, so that an adequate setup can be chosen and the results of
the probing can be interpreted with a reliability estimate. We exemplify this
by showcasing our method on a case study of probing for part of speech in a
trained neural machine translation encoder.Comment: Accepted to TSD 2020. Will be published in Springer LNC
The Golden Rule as a Heuristic to Measure the Fairness of Texts Using Machine Learning
In this paper we present a natural language programming framework to consider
how the fairness of acts can be measured. For the purposes of the paper, a fair
act is defined as one that one would be accepting of if it were done to
oneself. The approach is based on an implementation of the golden rule (GR) in
the digital domain. Despite the GRs prevalence as an axiom throughout history,
no transfer of this moral philosophy into computational systems exists. In this
paper we consider how to algorithmically operationalise this rule so that it
may be used to measure sentences such as: the boy harmed the girl, and
categorise them as fair or unfair. A review and reply to criticisms of the GR
is made. A suggestion of how the technology may be implemented to avoid unfair
biases in word embeddings is made - given that individuals would typically not
wish to be on the receiving end of an unfair act, such as racism, irrespective
of whether the corpus being used deems such discrimination as praiseworthy