Search CORE

43 research outputs found

Impossibility Theorems for Feature Attribution

Author: Bilodeau Blair
Jaques Natasha
Kim Been
Koh Pang Wei
Publication venue
Publication date: 11/04/2023
Field of study

Despite a sea of interpretability methods that can produce plausible explanations, the field has also empirically seen many failure cases of such methods. In light of these results, it remains unclear for practitioners how to use these methods and choose between them in a principled way. In this paper, we show that for moderately rich model classes (easily satisfied by neural networks), any feature attribution method that is complete and linear -- for example, Integrated Gradients and SHAP -- can provably fail to improve on random guessing for inferring model behaviour. Our results apply to common end-tasks such as characterizing local model behaviour, identifying spurious features, and algorithmic recourse. One takeaway from our work is the importance of concretely defining end-tasks: once such an end-task is defined, a simple and direct approach of repeated model evaluations can outperform many other complex feature attribution methods.Comment: 36 pages, 4 figures. Significantly expanded experiment

arXiv.org e-Print Archive

Hierarchical Reinforcement Learning for Open-Domain Dialog

Author: Ghandeharioun Asma
Jaques Natasha
Picard Rosalind
Saleh Abdelrhman
Shen Judy Hanwen
Publication venue
Publication date: 31/12/2019
Field of study

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning, VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements -- in terms of both human evaluation and automatic metrics -- over state-of-the-art dialog models, including Transformers

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Moral Foundations of Large Language Models

Author: Abdulhai Marwa
Canny John
Crepy Clément
Jaques Natasha
Serapio-Garcia Gregory
Valter Daria
Publication venue
Publication date: 23/10/2023
Field of study

Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors, including care/harm, liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary in the weight they place on these dimensions when making moral decisions, in part due to their cultural upbringing and political ideology. As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora. This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values. We analyze known LLMs and find they exhibit particular moral foundations, and show how these relate to human moral foundations and political affiliations. We also measure the consistency of these biases, or whether they vary strongly depending on the context of how the model is prompted. Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks. These findings help illustrate the potential risks and unintended consequences of LLMs assuming a particular moral stance

arXiv.org e-Print Archive

Active learning for electrodermal activity classification

Author: Fedor Szymon
Jaques Natasha Mary
Picard Rosalind W.
Taylor Sara Ann
Xia Victoria F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2015
Field of study

To filter noise or detect features within physiological signals, it is often effective to encode expert knowledge into a model such as a machine learning classifier. However, training such a model can require much effort on the part of the researcher; this often takes the form of manually labeling portions of signal needed to represent the concept being trained. Active learning is a technique for reducing human effort by developing a classifier that can intelligently select the most relevant data samples and ask for labels for only those samples, in an iterative process. In this paper we demonstrate that active learning can reduce the labeling effort required of researchers by as much as 84% for our application, while offering equivalent or even slightly improved machine learning performance.MIT Media Lab ConsortiumRobert Wood Johnson Foundatio

DSpace@MIT

Crossref

Wavelet-based motion artifact removal for electrodermal activity

Author: Chen Weixuan
Fedor Szymon
Jaques Natasha Mary
Picard Rosalind W.
Sano Akane
Taylor Sara Ann
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

Electrodermal activity (EDA) recording is a powerful, widely used tool for monitoring psychological or physiological arousal. However, analysis of EDA is hampered by its sensitivity to motion artifacts. We propose a method for removing motion artifacts from EDA, measured as skin conductance (SC), using a stationary wavelet transform (SWT). We modeled the wavelet coefficients as a Gaussian mixture distribution corresponding to the underlying skin conductance level (SCL) and skin conductance responses (SCRs). The goodness-of-fit of the model was validated on ambulatory SC data. We evaluated the proposed method in comparison with three previous approaches. Our method achieved a greater reduction of artifacts while retaining motion-artifact-free data

DSpace@MIT

Crossref