29 research outputs found
A Review of the Role of Causality in Developing Trustworthy AI Systems
State-of-the-art AI models largely lack an understanding of the cause-effect
relationship that governs human understanding of the real world. Consequently,
these models do not generalize to unseen data, often produce unfair results,
and are difficult to interpret. This has led to efforts to improve the
trustworthiness aspects of AI models. Recently, causal modeling and inference
methods have emerged as powerful tools. This review aims to provide the reader
with an overview of causal methods that have been developed to improve the
trustworthiness of AI models. We hope that our contribution will motivate
future research on causality-based solutions for trustworthy AI.Comment: 55 pages, 8 figures. Under revie
A Survey on Fairness in Large Language Models
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture
social biases from unprocessed training data and propagate the biases to
downstream tasks. Unfair LLM systems have undesirable social impacts and
potential harms. In this paper, we provide a comprehensive review of related
research on fairness in LLMs. First, for medium-scale LLMs, we introduce
evaluation metrics and debiasing methods from the perspectives of intrinsic
bias and extrinsic bias, respectively. Then, for large-scale LLMs, we introduce
recent fairness research, including fairness evaluation, reasons for bias, and
debiasing methods. Finally, we discuss and provide insight on the challenges
and future directions for the development of fairness in LLMs.Comment: 12 pages, 2 figures, 101 reference
Probing Classifiers are Unreliable for Concept Removal and Detection
Neural network models trained on text data have been found to encode
undesirable linguistic or sensitive concepts in their representation. Removing
such concepts is non-trivial because of a complex relationship between the
concept, text input, and the learnt representation. Recent work has proposed
post-hoc and adversarial methods to remove such unwanted concepts from a
model's representation. Through an extensive theoretical and empirical
analysis, we show that these methods can be counter-productive: they are unable
to remove the concepts entirely, and in the worst case may end up destroying
all task-relevant features. The reason is the methods' reliance on a probing
classifier as a proxy for the concept. Even under the most favorable conditions
for learning a probing classifier when a concept's relevant features in
representation space alone can provide 100% accuracy, we prove that a probing
classifier is likely to use non-concept features and thus post-hoc or
adversarial methods will fail to remove the concept correctly. These
theoretical implications are confirmed by experiments on models trained on
synthetic, Multi-NLI, and Twitter datasets. For sensitive applications of
concept removal such as fairness, we recommend caution against using these
methods and propose a spuriousness metric to gauge the quality of the final
classifier
Sustaining Fairness via Incremental Learning
Machine learning systems are often deployed for making critical decisions
like credit lending, hiring, etc. While making decisions, such systems often
encode the user's demographic information (like gender, age) in their
intermediate representations. This can lead to decisions that are biased
towards specific demographics. Prior work has focused on debiasing intermediate
representations to ensure fair decisions. However, these approaches fail to
remain fair with changes in the task or demographic distribution. To ensure
fairness in the wild, it is important for a system to adapt to such changes as
it accesses new data in an incremental fashion. In this work, we propose to
address this issue by introducing the problem of learning fair representations
in an incremental learning setting. To this end, we present Fairness-aware
Incremental Representation Learning (FaIRL), a representation learning system
that can sustain fairness while incrementally learning new tasks. FaIRL is able
to achieve fairness and learn new tasks by controlling the rate-distortion
function of the learned representations. Our empirical evaluations show that
FaIRL is able to make fair decisions while achieving high performance on the
target task, outperforming several baselines.Comment: Accepted at AAAI 202
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Towards Faithful Model Explanation in NLP: A Survey
End-to-end neural NLP architectures are notoriously difficult to understand,
which gives rise to numerous efforts towards model explainability in recent
years. An essential principle of model explanation is Faithfulness, i.e., an
explanation should accurately represent the reasoning process behind the
model's prediction. This survey first discusses the definition and evaluation
of Faithfulness, as well as its significance for explainability. We then
introduce the recent advances in faithful explanation by grouping approaches
into five categories: similarity methods, analysis of model-internal
structures, backpropagation-based methods, counterfactual intervention, and
self-explanatory models. Each category will be illustrated with its
representative studies, advantages, and shortcomings. Finally, we discuss all
the above methods in terms of their common virtues and limitations, and reflect
on future work directions towards faithful explainability. For researchers
interested in studying interpretability, this survey will offer an accessible
and comprehensive overview of the area, laying the basis for further
exploration. For users hoping to better understand their own models, this
survey will be an introductory manual helping with choosing the most suitable
explanation method(s).Comment: 62 page