26 research outputs found
Aligning Language Models with Human Preferences via a Bayesian Approach
In the quest to advance human-centric natural language generation (NLG)
systems, ensuring alignment between NLG models and human preferences is
crucial. For this alignment, current popular methods leverage a reinforcement
learning (RL) approach with a reward model trained on feedback from humans.
However, inherent disagreements due to the subjective nature of human
preferences pose a significant challenge for training the reward model,
resulting in a deterioration of the NLG performance. To tackle this issue,
previous approaches typically rely on majority voting or averaging to
consolidate multiple inconsistent preferences into a merged one. Although
straightforward to understand and execute, such methods suffer from an
inability to capture the nuanced degrees of disaggregation among humans and may
only represent a specialized subset of individuals, thereby lacking the ability
to quantitatively disclose the universality of human preferences. To address
this challenge, this paper proposes a novel approach, which employs a Bayesian
framework to account for the distribution of disagreements among human
preferences as training a preference model, and names it as d-PM. Besides,
considering the RL strategy's inefficient and complex training process over the
training efficiency, we further propose utilizing the contrastive learning
strategy to train the NLG model with the preference scores derived from the
d-PM model. Extensive experiments on two human-centric NLG tasks, i.e.,
emotional support conversation and integrity "Rule-of-Thumb" generation, show
that our method consistently exceeds previous SOTA models in both automatic and
human evaluations.Comment: NeurIPS 202
Enhancing the Rationale-Input Alignment for Self-explaining Rationalization
Rationalization empowers deep learning models with self-explaining
capabilities through a cooperative game, where a generator selects a
semantically consistent subset of the input as a rationale, and a subsequent
predictor makes predictions based on the selected rationale. In this paper, we
discover that rationalization is prone to a problem named \emph{rationale
shift}, which arises from the algorithmic bias of the cooperative game.
Rationale shift refers to a situation where the semantics of the selected
rationale may deviate from the original input, but the predictor still produces
accurate predictions based on the deviation, resulting in a compromised
generator with misleading feedback.
To address this issue, we first demonstrate the importance of the alignment
between the rationale and the full input through both empirical observations
and theoretical analysis. Subsequently, we introduce a novel approach called
DAR (\textbf{D}iscriminatively \textbf{A}ligned \textbf{R}ationalization),
which utilizes an auxiliary module pretrained on the full input to
discriminatively align the selected rationale and the original input. We
theoretically illustrate how DAR accomplishes the desired alignment, thereby
overcoming the rationale shift problem. The experiments on two widely used
real-world benchmarks show that the proposed method significantly improves the
explanation quality (measured by the overlap between the model-selected
explanation and the human-annotated rationale) as compared to state-of-the-art
techniques. Additionally, results on two synthetic settings further validate
the effectiveness of DAR in addressing the rationale shift problem.Comment: Accept at ICDE 202
DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning
Online Class-Incremental (OCI) learning has sparked new approaches to expand
the previously trained model knowledge from sequentially arriving data streams
with new classes. Unfortunately, OCI learning can suffer from catastrophic
forgetting (CF) as the decision boundaries for old classes can become
inaccurate when perturbated by new ones. Existing literature have applied the
data augmentation (DA) to alleviate the model forgetting, while the role of DA
in OCI has not been well understood so far. In this paper, we theoretically
show that augmented samples with lower correlation to the original data are
more effective in preventing forgetting. However, aggressive augmentation may
also reduce the consistency between data and corresponding labels, which
motivates us to exploit proper DA to boost the OCI performance and prevent the
CF problem. We propose the Enhanced Mixup (EnMix) method that mixes the
augmented samples and their labels simultaneously, which is shown to enhance
the sample diversity while maintaining strong consistency with corresponding
labels. Further, to solve the class imbalance problem, we design an Adaptive
Mixup (AdpMix) method to calibrate the decision boundaries by mixing samples
from both old and new classes and dynamically adjusting the label mixing ratio.
Our approach is demonstrated to be effective on several benchmark datasets
through extensive experiments, and it is shown to be compatible with other
replay-based techniques.Comment: 10 pages, 7 figures and 3 table
FR: Folded Rationalization with a Unified Encoder
Conventional works generally employ a two-phase model in which a generator
selects the most important pieces, followed by a predictor that makes
predictions based on the selected pieces. However, such a two-phase model may
incur the degeneration problem where the predictor overfits to the noise
generated by a not yet well-trained generator and in turn, leads the generator
to converge to a sub-optimal model that tends to select senseless pieces. To
tackle this challenge, we propose Folded Rationalization (FR) that folds the
two phases of the rationale model into one from the perspective of text
semantic extraction. The key idea of FR is to employ a unified encoder between
the generator and predictor, based on which FR can facilitate a better
predictor by access to valuable information blocked by the generator in the
traditional two-phase model and thus bring a better generator. Empirically, we
show that FR improves the F1 score by up to 10.3% as compared to
state-of-the-art methods.Comment: Accepted at NeurIPS 202