47 research outputs found
Finite Dimensional Infinite Constellations
In the setting of a Gaussian channel without power constraints, proposed by
Poltyrev, the codewords are points in an n-dimensional Euclidean space (an
infinite constellation) and the tradeoff between their density and the error
probability is considered. The capacity in this setting is the highest
achievable normalized log density (NLD) with vanishing error probability. This
capacity as well as error exponent bounds for this setting are known. In this
work we consider the optimal performance achievable in the fixed blocklength
(dimension) regime. We provide two new achievability bounds, and extend the
validity of the sphere bound to finite dimensional infinite constellations. We
also provide asymptotic analysis of the bounds: When the NLD is fixed, we
provide asymptotic expansions for the bounds that are significantly tighter
than the previously known error exponent results. When the error probability is
fixed, we show that as n grows, the gap to capacity is inversely proportional
(up to the first order) to the square-root of n where the proportion constant
is given by the inverse Q-function of the allowed error probability, times the
square root of 1/2. In an analogy to similar result in channel coding, the
dispersion of infinite constellations is 1/2nat^2 per channel use. All our
achievability results use lattices and therefore hold for the maximal error
probability as well. Connections to the error exponent of the power constrained
Gaussian channel and to the volume-to-noise ratio as a figure of merit are
discussed. In addition, we demonstrate the tightness of the results numerically
and compare to state-of-the-art coding schemes.Comment: 54 pages, 13 figures. Submitted to IEEE Transactions on Information
Theor
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa
Predicting In-game Actions from Interviews of NBA Players
Sports competitions are widely researched in computer and social science,
with the goal of understanding how players act under uncertainty. While there
is an abundance of computational work on player metrics prediction based on
past performance, very few attempts to incorporate out-of-game signals have
been made. Specifically, it was previously unclear whether linguistic signals
gathered from players' interviews can add information which does not appear in
performance metrics. To bridge that gap, we define text classification tasks of
predicting deviations from mean in NBA players' in-game actions, which are
associated with strategic choices, player behavior and risk, using their choice
of language prior to the game. We collected a dataset of transcripts from key
NBA players' pre-game interviews and their in-game performance metrics,
totalling in 5,226 interview-metric pairs. We design neural models for players'
action prediction based on increasingly more complex aspects of the language
signals in their open-ended interviews. Our models can make their predictions
based on the textual signal alone, or on a combination with signals from
past-performance metrics. Our text-based models outperform strong baselines
trained on performance metrics only, demonstrating the importance of language
usage for action prediction. Moreover, the models that employ both textual
input and past-performance metrics produced the best results. Finally, as
neural networks are notoriously difficult to interpret, we propose a method for
gaining further insight into what our models have learned. Particularly, we
present an LDA-based analysis, where we interpret model predictions in terms of
correlated topics. We find that our best performing textual model is most
associated with topics that are intuitively related to each prediction task and
that better models yield higher correlation with more informative topics.Comment: First two authors contributed equally. To be published in the
Computational Linguistics journal. Code is available at:
https://github.com/nadavo/moo
Evaluating the Moral Beliefs Encoded in LLMs
This paper presents a case study on the design, administration,
post-processing, and evaluation of surveys on large language models (LLMs). It
comprises two components: (1) A statistical method for eliciting beliefs
encoded in LLMs. We introduce statistical measures and evaluation metrics that
quantify the probability of an LLM "making a choice", the associated
uncertainty, and the consistency of that choice. (2) We apply this method to
study what moral beliefs are encoded in different LLMs, especially in ambiguous
cases where the right choice is not obvious. We design a large-scale survey
comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white
lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a
pedestrian on the road?"). Each scenario includes a description, two possible
actions, and auxiliary labels indicating violated rules (e.g., "do not kill").
We administer the survey to 28 open- and closed-source LLMs. We find that (a)
in unambiguous scenarios, most models "choose" actions that align with
commonsense. In ambiguous cases, most models express uncertainty. (b) Some
models are uncertain about choosing the commonsense action because their
responses are sensitive to the question-wording. (c) Some models reflect clear
preferences in ambiguous scenarios. Specifically, closed-source models tend to
agree with each other
Causal-structure Driven Augmentations for Text OOD Generalization
The reliance of text classifiers on spurious correlations can lead to poor
generalization at deployment, raising concerns about their use in
safety-critical domains such as healthcare. In this work, we propose to use
counterfactual data augmentation, guided by knowledge of the causal structure
of the data, to simulate interventions on spurious features and to learn more
robust text classifiers. We show that this strategy is appropriate in
prediction problems where the label is spuriously correlated with an attribute.
Under the assumptions of such problems, we discuss the favorable sample
complexity of counterfactual data augmentation, compared to importance
re-weighting. Pragmatically, we match examples using auxiliary data, based on
diff-in-diff methodology, and use a large language model (LLM) to represent a
conditional probability of text. Through extensive experimentation on learning
caregiver-invariant predictors of clinical diagnoses from medical narratives
and on semi-synthetic data, we demonstrate that our method for simulating
interventions improves out-of-distribution (OOD) accuracy compared to baseline
invariant learning algorithms.Comment: Forthcoming in NeurIPS 202
In the Eye of the Beholder: Robust Prediction with Causal User Modeling
Accurately predicting the relevance of items to users is crucial to the
success of many social platforms. Conventional approaches train models on
logged historical data; but recommendation systems, media services, and online
marketplaces all exhibit a constant influx of new content -- making relevancy a
moving target, to which standard predictive models are not robust. In this
paper, we propose a learning framework for relevance prediction that is robust
to changes in the data distribution. Our key observation is that robustness can
be obtained by accounting for how users causally perceive the environment. We
model users as boundedly-rational decision makers whose causal beliefs are
encoded by a causal graph, and show how minimal information regarding the graph
can be used to contend with distributional changes. Experiments in multiple
settings demonstrate the effectiveness of our approach.Comment: Accepted to NeurIPS 202
An Invariant Learning Characterization of Controlled Text Generation
Controlled generation refers to the problem of creating text that contains
stylistic or semantic attributes of interest. Many approaches reduce this
problem to training a predictor of the desired attribute. For example,
researchers hoping to deploy a large language model to produce non-toxic
content may use a toxicity classifier to filter generated text. In practice,
the generated text to classify, which is determined by user prompts, may come
from a wide range of distributions. In this paper, we show that the performance
of controlled generation may be poor if the distributions of text in response
to user prompts differ from the distribution the predictor was trained on. To
address this problem, we cast controlled generation under distribution shift as
an invariant learning problem: the most effective predictor should be invariant
across multiple text environments. We then discuss a natural solution that
arises from this characterization and propose heuristics for selecting natural
environments. We study this characterization and the proposed method
empirically using both synthetic and real data. Experiments demonstrate both
the challenge of distribution shift in controlled generation and the potential
of invariance methods in this setting.Comment: To appear in the 2023 Conference of the Association for Computational
Linguistics (ACL 2023
A Linked Coptic Dictionary Online
We describe a new project publishing a freely available online dictionary for Coptic. The dictionary encompasses comprehensive cross-referencing mechanisms, including linking entries to an online scanned edition of Crum’s Coptic Dictionary, internal cross-references and etymological information, translated searchable definitions in English, French and German, and linked corpus data which provides frequencies and corpus look-up for headwords and multiword expressions. Headwords are available for linking in external projects using a REST API. We describe the challenges in encoding our dictionary using TEI XML and implementing linking mechanisms to construct a Web interface querying frequency information, which draw on NLP tools to recognize inflected forms in context. We evaluate our dictionary’s coverage using digital corpora of Coptic available online
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals
Causal explanations of the predictions of NLP systems are essential to ensure
safety and establish trust. Yet, existing methods often fall short of
explaining model predictions effectively or efficiently and are often
model-specific. In this paper, we address model-agnostic explanations,
proposing two approaches for counterfactual (CF) approximation. The first
approach is CF generation, where a large language model (LLM) is prompted to
change a specific text concept while keeping confounding concepts unchanged.
While this approach is demonstrated to be very effective, applying LLM at
inference-time is costly. We hence present a second approach based on matching,
and propose a method that is guided by an LLM at training-time and learns a
dedicated embedding space. This space is faithful to a given causal graph and
effectively serves to identify matches that approximate CFs. After showing
theoretically that approximating CFs is required in order to construct faithful
explanations, we benchmark our approaches and explain several models, including
LLMs with billions of parameters. Our empirical results demonstrate the
excellent performance of CF generation models as model-agnostic explainers.
Moreover, our matching approach, which requires far less test-time resources,
also provides effective explanations, surpassing many baselines. We also find
that Top-K techniques universally improve every tested method. Finally, we
showcase the potential of LLMs in constructing new benchmarks for model
explanation and subsequently validate our conclusions. Our work illuminates new
pathways for efficient and accurate approaches to interpreting NLP systems