13 research outputs found
Improving Conversational Recommender System via Contextual and Time-Aware Modeling with Less Domain-Specific Knowledge
Conversational Recommender Systems (CRS) has become an emerging research
topic seeking to perform recommendations through interactive conversations,
which generally consist of generation and recommendation modules. Prior work on
CRS tends to incorporate more external and domain-specific knowledge like item
reviews to enhance performance. Despite the fact that the collection and
annotation of the external domain-specific information needs much human effort
and degenerates the generalizability, too much extra knowledge introduces more
difficulty to balance among them. Therefore, we propose to fully discover and
extract internal knowledge from the context. We capture both entity-level and
contextual-level representations to jointly model user preferences for the
recommendation, where a time-aware attention is designed to emphasize the
recently appeared items in entity-level representations. We further use the
pre-trained BART to initialize the generation module to alleviate the data
scarcity and enhance the context modeling. In addition to conducting
experiments on a popular dataset (ReDial), we also include a multi-domain
dataset (OpenDialKG) to show the effectiveness of our model. Experiments on
both datasets show that our model achieves better performance on most
evaluation metrics with less external knowledge and generalizes well to other
domains. Additional analyses on the recommendation and generation tasks
demonstrate the effectiveness of our model in different scenarios
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Training a high performance end-to-end speech (E2E) processing model requires
an enormous amount of labeled speech data, especially in the era of
data-centric artificial intelligence. However, labeled speech data are usually
scarcer and more expensive for collection, compared to textual data. We propose
Latent Synthesis (LaSyn), an efficient textual data utilization framework for
E2E speech processing models. We train a latent synthesizer to convert textual
data into an intermediate latent representation of a pre-trained speech model.
These pseudo acoustic representations of textual data augment acoustic data for
model training. We evaluate LaSyn on low-resource automatic speech recognition
(ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an
E2E baseline trained on LibriSpeech train-clean-100, with relative word error
rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our
E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for
slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM)
and EM-Tree accuracies on STOP respectively. With fewer parameters, the results
of LaSyn are competitive to published state-of-the-art works. The results
demonstrate the quality of the augmented training data.Comment: 15 pages, 8 figures, 8 tables, Accepted to EMNLP 2023 Finding
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
Managing long sequences has become an important and necessary feature for
large language models (LLMs). However, it is still an open question of how to
comprehensively and systematically evaluate the long-sequence capability of
LLMs. One of the reasons is that conventional and widely-used benchmarks mainly
consist of short sequences. In this paper, we propose M4LE, a Multi-ability,
Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation.
M4LE is based on a diverse NLP task pool comprising 36 NLP datasets, 11 task
types and 12 domains. To alleviate the scarcity of tasks with naturally long
sequences and incorporate multiple-ability assessment, we propose an automatic
approach (but with negligible human annotations) to convert short-sequence
tasks into a unified long-sequence scenario where LLMs have to identify single
or multiple relevant spans in long contexts based on explicit or semantic
hints. Specifically, the scenario includes five different types of abilities:
(1) explicit single-span; (2) semantic single-span; (3) explicit multiple-span;
(4) semantic multiple-span; and (5) global context understanding. The resulting
samples in M4LE are evenly distributed from 1k to 8k input length. We conducted
a systematic evaluation on 11 well-established LLMs, especially those optimized
for long-sequence inputs. Our results reveal that: 1) Current LLMs struggle to
understand long context, particularly when tasks require multiple-span
attention. 2) Semantic retrieval task is more difficult for competent LLMs. 3)
Models fine-tuned on longer text with position interpolation have comparable
performance to those using Neural Tangent Kernel (NTK) aware scaling methods
without fine-tuning. We make our benchmark publicly available to encourage
future research in this challenging area.Comment: Code and data are available at https://github.com/KwanWaiChung/M4L
Aligning Large Language Models with Human: A Survey
Large Language Models (LLMs) trained on extensive textual corpora have
emerged as leading solutions for a broad array of Natural Language Processing
(NLP) tasks. Despite their notable performance, these models are prone to
certain limitations such as misunderstanding human instructions, generating
potentially biased content, or factually incorrect (hallucinated) information.
Hence, aligning LLMs with human expectations has become an active area of
interest within the research community. This survey presents a comprehensive
overview of these alignment technologies, including the following aspects. (1)
Data collection: the methods for effectively collecting high-quality
instructions for LLM alignment, including the use of NLP benchmarks, human
annotations, and leveraging strong LLMs. (2) Training methodologies: a detailed
review of the prevailing training methods employed for LLM alignment. Our
exploration encompasses Supervised Fine-tuning, both Online and Offline human
preference training, along with parameter-efficient training mechanisms. (3)
Model Evaluation: the methods for evaluating the effectiveness of these
human-aligned LLMs, presenting a multifaceted approach towards their
assessment. In conclusion, we collate and distill our findings, shedding light
on several promising future research avenues in the field. This survey,
therefore, serves as a valuable resource for anyone invested in understanding
and advancing the alignment of LLMs to better suit human-oriented tasks and
expectations. An associated GitHub link collecting the latest papers is
available at https://github.com/GaryYufei/AlignLLMHumanSurvey.Comment: work in progres
DINO-VITS: Data-Efficient Noise-Robust Zero-Shot Voice Cloning via Multi-Tasking with Self-Supervised Speaker Verification Loss
Recent progress in self-supervised representation learning has opened up new
opportunities for training from unlabeled data and has been a growing trend in
voice conversion. However, unsupervised training of voice cloning seems to
remain a challenging task. In this paper we propose a semi-supervised zero-shot
voice cloning approach that works by adapting a HuBERT-based voice conversion
system to the voice cloning task and shows the robustness of such a system to
noises both in training data (we add noises resulting in up to 0db
signal-to-noise-ratio to 35% of training data with no significant degradation
of evaluation metrics) and in the target speaker reference audio at inference.
Moreover, such a method does not require any type of denoising or
noise-labeling of training data. Finally, we introduce a novel multi-tasking
approach by incorporating self-supervised DINO loss into joint training of a
CAM++ based speaker verification system and a unit-based VITS cloning system.
We show that it significantly improves the quality of generated audio over
baselines, especially for noisy target speaker references.Comment: Submitted to ICASSP 202
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
The ability to follow instructions is crucial for Large Language Models
(LLMs) to handle various real-world applications. Existing benchmarks primarily
focus on evaluating pure response quality, rather than assessing whether the
response follows constraints stated in the instruction. To fill this research
gap, in this paper, we propose FollowBench, a Multi-level Fine-grained
Constraints Following Benchmark for LLMs. FollowBench comprehensively includes
five different types (i.e., Content, Situation, Style, Format, and Example) of
fine-grained constraints. To enable a precise constraint following estimation
on diverse difficulties, we introduce a Multi-level mechanism that
incrementally adds a single constraint to the initial instruction at each
increased level. To assess whether LLMs' outputs have satisfied every
individual constraint, we propose to prompt strong LLMs with
constraint-evolution paths to handle challenging open-ended instructions. By
evaluating ten closed-source and open-source popular LLMs on FollowBench, we
highlight the weaknesses of LLMs in instruction following and point towards
potential avenues for future work. The data and code are publicly available at
https://github.com/YJiangcm/FollowBench.Comment: 19 pages, 9 figures, 14 table
SELF: Self-Evolution with Language Feedback
Large Language Models (LLMs) have demonstrated remarkable versatility across
various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution
with Language Feedback), a novel approach that enables LLMs to self-improve
through self-reflection, akin to human learning processes. SELF initiates with
a meta-skill learning process that equips the LLMs with capabilities for
self-feedback and self-refinement. Subsequently, the model undergoes an
iterative process of self-evolution. In each iteration, it utilizes an
unlabeled dataset of instructions to generate initial responses. These
responses are enhanced through self-feedback and self-refinement. The model is
then fine-tuned using this enhanced data. The model undergoes progressive
improvement through this iterative self-evolution process. Moreover, the SELF
framework enables the model to apply self-refinement during inference, which
further improves response quality. Our experiments in mathematics and general
tasks demonstrate that SELF can enhance the capabilities of LLMs without human
intervention. The SELF framework indicates a promising direction for the
autonomous evolution of LLMs, transitioning them from passive information
receivers to active participants in their development.Comment: 20 pages, 4 figures, 11 table