69 research outputs found
Pre-training Multi-party Dialogue Models with Latent Discourse Inference
Multi-party dialogues are more difficult for models to understand than
one-to-one two-party dialogues, since they involve multiple interlocutors,
resulting in interweaving reply-to relations and information flows. To step
over these obstacles, an effective way is to pre-train a model that understands
the discourse structure of multi-party dialogues, namely, to whom each
utterance is replying. However, due to the lack of explicitly annotated
discourse labels in multi-party dialogue corpora, previous works fail to scale
up the pre-training process by putting aside the unlabeled multi-party
conversational data for nothing. To fully utilize the unlabeled data, we
propose to treat the discourse structures as latent variables, then jointly
infer them and pre-train the discourse-aware model by unsupervised latent
variable inference methods. Experiments on multiple downstream tasks show that
our pre-trained model outperforms strong baselines by large margins and
achieves state-of-the-art (SOTA) results, justifying the effectiveness of our
method. The official implementation of this paper is available at
https://github.com/EricLee8/MPD_EMVI.Comment: Accepted by ACL 202
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving
Large Language Models (LLMs) have driven substantial progress in artificial
intelligence in recent years, exhibiting impressive capabilities across a wide
range of tasks, including mathematical problem-solving. Inspired by the success
of subgoal-based methods, we propose a novel framework called
\textbf{SE}quential sub\textbf{G}oal \textbf{O}ptimization (SEGO) to enhance
LLMs' ability to solve mathematical problems. By establishing a connection
between the subgoal breakdown process and the probability of solving problems,
SEGO aims to identify better subgoals with theoretical guarantees. Addressing
the challenge of identifying suitable subgoals in a large solution space, our
framework generates problem-specific subgoals and adjusts them according to
carefully designed criteria. Incorporating these optimized subgoals into the
policy model training leads to significant improvements in problem-solving
performance. We validate SEGO's efficacy through experiments on two benchmarks,
GSM8K and MATH, where our approach outperforms existing methods, highlighting
the potential of SEGO in AI-driven mathematical problem-solving.
Data and code associated with this paper will be available at
https://github.com/zhaoxlpku/SEGOComment: Preprin
MALA: Cross-Domain Dialogue Generation with Action Learning
Response generation for task-oriented dialogues involves two basic
components: dialogue planning and surface realization. These two components,
however, have a discrepancy in their objectives, i.e., task completion and
language quality. To deal with such discrepancy, conditioned response
generation has been introduced where the generation process is factorized into
action decision and language generation via explicit action representations. To
obtain action representations, recent studies learn latent actions in an
unsupervised manner based on the utterance lexical similarity. Such an action
learning approach is prone to diversities of language surfaces, which may
impinge task completion and language quality. To address this issue, we propose
multi-stage adaptive latent action learning (MALA) that learns semantic latent
actions by distinguishing the effects of utterances on dialogue progress. We
model the utterance effect using the transition of dialogue states caused by
the utterance and develop a semantic similarity measurement that estimates
whether utterances have similar effects. For learning semantic actions on
domains without dialogue states, MsALA extends the semantic similarity
measurement across domains progressively, i.e., from aligning shared actions to
learning domain-specific actions. Experiments using multi-domain datasets, SMD
and MultiWOZ, show that our proposed model achieves consistent improvements
over the baselines models in terms of both task completion and language
quality.Comment: 9 pages, 3 figure
Research progress in the effect of nutritional intervention on cognitive impairment related to Alzheimer's disease
Alzheimer's disease (AD) is an age-related neurodegenerative disease with insidious onset and slow progression. The progression of AD from only brain pathological changes to clinically identifiable cognitive changes is affected by a variety of environmental factors inside and outside the organism and can last for decades. Cognitive impairment is an important clinical feature of AD that impairs the quality of life of the elderly in their later years, and the available drugs for the treatment of AD have failed to cure the disease, indicating the importance of early prevention of AD-related cognitive impairment. Most current research on the relationship between nutrition and AD takes nutritional intervention as a preventive method for AD-related cognitive impairment. The role of dietary supplement or restriction on AD-related cognitive impairment is related to multiple pathways. It is worth noting that the gut microbiome, as an important medium in the effect of dietary on the host, can influence cognitive function through the "microbial-gut-brain axis". The antioxidant and anti-inflammatory properties of some foods are beneficial for improving cognitive function. In this paper, relevant studies in recent years were analyzed to discuss the effects of certain single nutrients (vitamins, polyphenols, and long chain polyunsaturated fatty acids) and overall nutritional patterns (Mediterranean diet, dietary approaches to stop hypertension diet, Mediterranean-DASH intervention for neurodegenerative delay, and ketogenic diet) on cognitive function, so as to provide ideas and reference for the prevention and treatment of AD-related cognitive impairment
TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design
High-quality instruction-tuning data is critical to improving LLM
capabilities. Existing data collection methods are limited by unrealistic
manual labeling costs or by the hallucination of relying solely on LLM
generation. To address the problems, this paper presents a scalable method to
automatically collect high-quality instructional adaptation data by training
language models to automatically design tasks based on human-written texts.
Intuitively, human-written text helps to help the model attenuate illusions
during the generation of tasks. Unlike instruction back-translation-based
methods that directly take the given text as a response, we require the model
to generate the \textit{instruction}, \textit{input}, and \textit{output}
simultaneously to filter the noise. The results of the automated and manual
evaluation experiments demonstrate the quality of our dataset.Comment: Work in progres
Knowledge Fusion of Large Language Models
While training large language models (LLMs) from scratch can generate models
with distinct functionalities and strengths, it comes at significant costs and
may result in redundant capabilities. Alternatively, a cost-effective and
compelling approach is to merge existing pre-trained LLMs into a more potent
model. However, due to the varying architectures of these LLMs, directly
blending their weights is impractical. In this paper, we introduce the notion
of knowledge fusion for LLMs, aimed at combining the capabilities of existing
LLMs and transferring them into a single LLM. By leveraging the generative
distributions of source LLMs, we externalize their collective knowledge and
unique strengths, thereby potentially elevating the capabilities of the target
model beyond those of any individual source LLM. We validate our approach using
three popular LLMs with different architectures--Llama-2, MPT, and
OpenLLaMA--across various benchmarks and tasks. Our findings confirm that the
fusion of LLMs can improve the performance of the target model across a range
of capabilities such as reasoning, commonsense, and code generation. Our code,
model weights, and data are public at
\url{https://github.com/fanqiwan/FuseLLM}.Comment: Accepted to ICLR 202
- …