392 research outputs found
Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification
Intent classification (IC) plays an important role in task-oriented dialogue
systems as it identifies user intents from given utterances. However, models
trained on limited annotations for IC often suffer from a lack of
generalization to unseen intent classes. We propose a novel pre-training method
for text encoders that uses contrastive learning with intent psuedo-labels to
produce embeddings that are well-suited for IC tasks. By applying this
pre-training strategy, we also introduce the pre-trained intent-aware encoder
(PIE). Specifically, we first train a tagger to identify key phrases within
utterances that are crucial for interpreting intents. We then use these
extracted phrases to create examples for pre-training a text encoder in a
contrastive manner. As a result, our PIE model achieves up to 5.4% and 4.0%
higher accuracy than the previous state-of-the-art pre-trained sentence encoder
for the N-way zero- and one-shot settings on four IC datasets
Mixture of Soft Prompts for Controllable Data Generation
Large language models (LLMs) effectively generate fluent text when the target
output follows natural language patterns. However, structured prediction tasks
confine the output format to a limited ontology, causing even very large models
to struggle since they were never trained with such restrictions in mind. The
difficulty of using LLMs for direct prediction is exacerbated in few-shot
learning scenarios, which commonly arise due to domain shift and resource
limitations. We flip the problem on its head by leveraging the LLM as a tool
for data augmentation rather than direct prediction. Our proposed Mixture of
Soft Prompts (MSP) serves as a parameter-efficient procedure for generating
data in a controlled manner. Denoising mechanisms are further applied to
improve the quality of synthesized data. Automatic metrics show our method is
capable of producing diverse and natural text, while preserving label
semantics. Moreover, MSP achieves state-of-the-art results on three benchmarks
when compared against strong baselines. Our method offers an alternate
data-centric approach for applying LLMs to complex prediction tasks.Comment: 19 pages, 13 Tables, 2 Figures. Accepted at EMNLP 202
Transfer-Free Data-Efficient Multilingual Slot Labeling
Slot labeling (SL) is a core component of task-oriented dialogue (ToD)
systems, where slots and corresponding values are usually language-, task- and
domain-specific. Therefore, extending the system to any new
language-domain-task configuration requires (re)running an expensive and
resource-intensive data annotation process. To mitigate the inherent data
scarcity issue, current research on multilingual ToD assumes that sufficient
English-language annotated data are always available for particular tasks and
domains, and thus operates in a standard cross-lingual transfer setup. In this
work, we depart from this often unrealistic assumption. We examine challenging
scenarios where such transfer-enabling English annotated data cannot be
guaranteed, and focus on bootstrapping multilingual data-efficient slot
labelers in transfer-free scenarios directly in the target languages without
any English-ready data. We propose a two-stage slot labeling approach (termed
TWOSL) which transforms standard multilingual sentence encoders into effective
slot labelers. In Stage 1, relying on SL-adapted contrastive learning with only
a handful of SL-annotated examples, we turn sentence encoders into
task-specific span encoders. In Stage 2, we recast SL from a token
classification into a simpler, less data-intensive span classification task.
Our results on two standard multilingual TOD datasets and across diverse
languages confirm the effectiveness and robustness of TWOSL. It is especially
effective for the most challenging transfer-free few-shot setups, paving the
way for quick and data-efficient bootstrapping of multilingual slot labelers
for ToD
Boosting Few-Shot Text Classification via Distribution Estimation
Distribution estimation has been demonstrated as one of the most effective
approaches in dealing with few-shot image classification, as the low-level
patterns and underlying representations can be easily transferred across
different tasks in computer vision domain. However, directly applying this
approach to few-shot text classification is challenging, since leveraging the
statistics of known classes with sufficient samples to calibrate the
distributions of novel classes may cause negative effects due to serious
category difference in text domain. To alleviate this issue, we propose two
simple yet effective strategies to estimate the distributions of the novel
classes by utilizing unlabeled query samples, thus avoiding the potential
negative transfer issue. Specifically, we first assume a class or sample
follows the Gaussian distribution, and use the original support set and the
nearest few query samples to estimate the corresponding mean and covariance.
Then, we augment the labeled samples by sampling from the estimated
distribution, which can provide sufficient supervision for training the
classification model. Extensive experiments on eight few-shot text
classification datasets show that the proposed method outperforms
state-of-the-art baselines significantly.Comment: Accepted to AAAI 202
- …