8 research outputs found
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa
Predicting In-game Actions from Interviews of NBA Players
Sports competitions are widely researched in computer and social science,
with the goal of understanding how players act under uncertainty. While there
is an abundance of computational work on player metrics prediction based on
past performance, very few attempts to incorporate out-of-game signals have
been made. Specifically, it was previously unclear whether linguistic signals
gathered from players' interviews can add information which does not appear in
performance metrics. To bridge that gap, we define text classification tasks of
predicting deviations from mean in NBA players' in-game actions, which are
associated with strategic choices, player behavior and risk, using their choice
of language prior to the game. We collected a dataset of transcripts from key
NBA players' pre-game interviews and their in-game performance metrics,
totalling in 5,226 interview-metric pairs. We design neural models for players'
action prediction based on increasingly more complex aspects of the language
signals in their open-ended interviews. Our models can make their predictions
based on the textual signal alone, or on a combination with signals from
past-performance metrics. Our text-based models outperform strong baselines
trained on performance metrics only, demonstrating the importance of language
usage for action prediction. Moreover, the models that employ both textual
input and past-performance metrics produced the best results. Finally, as
neural networks are notoriously difficult to interpret, we propose a method for
gaining further insight into what our models have learned. Particularly, we
present an LDA-based analysis, where we interpret model predictions in terms of
correlated topics. We find that our best performing textual model is most
associated with topics that are intuitively related to each prediction task and
that better models yield higher correlation with more informative topics.Comment: First two authors contributed equally. To be published in the
Computational Linguistics journal. Code is available at:
https://github.com/nadavo/moo
On the Robustness of Dialogue History Representation in Conversational Question Answering: A Comprehensive Study and a New Prompt-based Method
AbstractMost work on modeling the conversation history in Conversational Question Answering (CQA) reports a single main result on a common CQA benchmark. While existing models show impressive results on CQA leaderboards, it remains unclear whether they are robust to shifts in setting (sometimes to more realistic ones), training data size (e.g., from large to small sets) and domain. In this work, we design and conduct the first large-scale robustness study of history modeling approaches for CQA. We find that high benchmark scores do not necessarily translate to strong robustness, and that various methods can perform extremely differently under different settings. Equipped with the insights from our study, we design a novel prompt-based history modeling approach and demonstrate its strong robustness across various settings. Our approach is inspired by existing methods that highlight historic answers in the passage. However, instead of highlighting by modifying the passage token embeddings, we add textual prompts directly in the passage text. Our approach is simple, easy to plug into practically any model, and highly effective, thus we recommend it as a starting point for future model developers. We also hope that our study and insights will raise awareness to the importance of robustness-focused evaluation, in addition to obtaining high leaderboard scores, leading to better CQA systems.
Measuring the Robustness of NLP Models to Domain Shifts
Existing research on Domain Robustness (DR) suffers from disparate setups,
limited task variety, and scarce research on recent capabilities such as
in-context learning. Furthermore, the common practice of measuring DR might not
be fully accurate. Current research focuses on challenge sets and relies solely
on the Source Drop (SD): Using the source in-domain performance as a reference
point for degradation. However, we argue that the Target Drop (TD), which
measures degradation from the target in-domain performance, should be used as a
complementary point of view. To address these issues, we first curated a DR
benchmark comprised of 7 diverse NLP tasks, which enabled us to measure both
the SD and the TD. We then conducted a comprehensive large-scale DR study
involving over 14,000 domain shifts across 21 fine-tuned models and few-shot
LLMs. We found that both model types suffer from drops upon domain shifts.
While fine-tuned models excel in-domain, few-shot LLMs often surpass them
cross-domain, showing better robustness. In addition, we found that a large SD
can often be explained by shifting to a harder domain rather than by a genuine
DR challenge, and this highlights the importance of TD as a complementary
metric. We hope our study will shed light on the current DR state of NLP models
and promote improved evaluation practices toward more robust models
Three-dimensional electronic scaffolds for monitoring and regulation of multifunctional hybrid tissues
Recently, the integration of electronic elements with cellular scaffolds has brought forth the ability to monitor and control tissue function actively by using flexible free-standing two-dimensional (2D) systems. Capabilities for electrically probing complex, physicochemical and biological three-dimensional (3D) microenvironments demand, however, 3D electronic scaffolds with well-controlled geometries and functional-component distributions. This work presents the development of flexible 3D electronic scaffolds with precisely defined dimensions and microelectrode configurations formed using a process that relies on geometric transformation of 2D precursors by compressive buckling. It demonstrates a capability to fabricate these constructs in diverse 3D architectures and/or electrode distributions aimed at achieving an enhanced level of control and regulation of tissue function relatively to that of other approaches. In addition, this work presents the integration of these 3D electronic scaffolds within engineered 3D cardiac tissues, for monitoring of tissue function, controlling tissue contraction through electrical stimulation, and initiating on-demand, local release of drugs, each through well-defined volumetric spaces. These ideas provide opportunities in fields ranging from in vitro drug development to in vivo tissue repair and many others