180 research outputs found
Direct policy search and uncertain policy evaluation
Reinforcement learning based on direct search in policy space requires few assumptions about the environment. Hence it is applicable in certain situations where most traditional reinforcement learning algorithms based on dynamic programming are not, especially in partially observable, deterministic worlds. In realistic settings, however, reliable policy evaluations are complicated by numerou sources of uncertainty, such as stochasticity in policy and environment. Given a limited life-time, how much time should a direct policy searcher spend on policy evaluations to obtain reliable statistics? Despite the fundamental nature of this question it has not received much attention yet. Our efficient approach based on the success-story algorithm (SSA) is radical in the sense that it never stops evaluating any previous policy modification except those it undoes for lack of empirical evidence that they have contributed to lifelong reward accelerations. Here we identify SSA’s fundamental advantages over traditional direct policy search (such as stochastic hill-climbing) on problems involving several sources of stochasticity and uncertaint),
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Language is increasingly being used to define rich visual recognition
problems with supporting image collections sourced from the web. Structured
prediction models are used in these tasks to take advantage of correlations
between co-occurring labels and visual input but risk inadvertently encoding
social biases found in web corpora. In this work, we study data and models
associated with multilabel object classification and visual semantic role
labeling. We find that (a) datasets for these tasks contain significant gender
bias and (b) models trained on these datasets further amplify existing bias.
For example, the activity cooking is over 33% more likely to involve females
than males in a training set, and a trained model further amplifies the
disparity to 68% at test time. We propose to inject corpus-level constraints
for calibrating existing structured prediction models and design an algorithm
based on Lagrangian relaxation for collective inference. Our method results in
almost no performance loss for the underlying recognition task but decreases
the magnitude of bias amplification by 47.5% and 40.5% for multilabel
classification and visual semantic role labeling, respectively.Comment: 11 pages, published in EMNLP 201
Safer-Instruct: Aligning Language Models with Automated Preference Data
Reinforcement Learning from Human Feedback (RLHF) is a vital strategy for
enhancing model safety in language models. However, annotating preference data
for RLHF is a resource-intensive and creativity-demanding process, while
automatic generation methods face limitations in data diversity and quality. In
response, we present Safer-Instruct, a novel pipeline for semi-automatically
constructing large-scale preference datasets. Our approach leverages reversed
instruction tuning, instruction induction, and expert model evaluation to
efficiently generate high-quality preference data without human annotators. We
evaluate Safer-Instruct using LLaMA for instruction induction and GPT-4 as an
expert model, generating approximately 10K preference samples. Finetuning an
Alpaca model on this dataset demonstrates improved harmlessness while
maintaining competitive performance on conversation and downstream tasks.
Safer-Instruct addresses the challenges in preference data acquisition,
advancing the development of safer and more responsible AI systems. Our code
and data are available at https://github.com/uscnlp-lime/safer-instructComment: 11 page
SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models
A common limitation of diagnostic tests for detecting social biases in NLP
models is that they may only detect stereotypic associations that are
pre-specified by the designer of the test. Since enumerating all possible
problematic associations is infeasible, it is likely these tests fail to detect
biases that are present in a model but not pre-specified by the designer. To
address this limitation, we propose SODAPOP (SOcial bias Discovery from Answers
about PeOPle) in social commonsense question-answering. Our pipeline generates
modified instances from the Social IQa dataset (Sap et al., 2019) by (1)
substituting names associated with different demographic groups, and (2)
generating many distractor answers from a masked language model. By using a
social commonsense model to score the generated distractors, we are able to
uncover the model's stereotypic associations between demographic groups and an
open set of words. We also test SODAPOP on debiased models and show the
limitations of multiple state-of-the-art debiasing algorithms.Comment: EACL 202
- …