105 research outputs found
MVP: Multi-task Supervised Pre-training for Natural Language Generation
Pre-trained language models (PLMs) have achieved remarkable success in
natural language generation (NLG) tasks. Up to now, most NLG-oriented PLMs are
pre-trained in an unsupervised manner using the large-scale general corpus. In
the meanwhile, an increasing number of models pre-trained with labeled data
(i.e. "supervised pre-training") showcase superior performance compared to
unsupervised pre-trained models. Motivated by the success of supervised
pre-training, we propose Multi-task superVised Pre-training (MVP) for natural
language generation. We collect a large-scale natural language generation
corpus, MVPCorpus, from datasets over diverse NLG tasks. Then we
unify these examples into a general text-to-text format to pre-train the text
generation model MVP in a supervised manner. For each task, we further
pre-train specific soft prompts to stimulate the model's capacity to perform a
specific task. Our MVP model can be seen as a practice that utilizes recent
instruction tuning on relatively small PLMs. Extensive experiments have
demonstrated the effectiveness and generality of our MVP model in a number of
NLG tasks, which achieves state-of-the-art performance on out of
datasets, outperforming BART by and Flan-T5 by .Comment: Accepted by ACL 202
Constrained Reinforcement Learning for Dynamic Material Handling
As one of the core parts of flexible manufacturing systems, material handling
involves storage and transportation of materials between workstations with
automated vehicles. The improvement in material handling can impulse the
overall efficiency of the manufacturing system. However, the occurrence of
dynamic events during the optimisation of task arrangements poses a challenge
that requires adaptability and effectiveness. In this paper, we aim at the
scheduling of automated guided vehicles for dynamic material handling.
Motivated by some real-world scenarios, unknown new tasks and unexpected
vehicle breakdowns are regarded as dynamic events in our problem. We formulate
the problem as a constrained Markov decision process which takes into account
tardiness and available vehicles as cumulative and instantaneous constraints,
respectively. An adaptive constrained reinforcement learning algorithm that
combines Lagrangian relaxation and invalid action masking, named RCPOM, is
proposed to address the problem with two hybrid constraints. Moreover, a
gym-like dynamic material handling simulator, named DMH-GYM, is developed and
equipped with diverse problem instances, which can be used as benchmarks for
dynamic material handling. Experimental results on the problem instances
demonstrate the outstanding performance of our proposed approach compared with
eight state-of-the-art constrained and non-constrained reinforcement learning
algorithms, and widely used dispatching rules for material handling.Comment: accepted by the 2023 International Joint Conference on Neural
Networks (IJCNN
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Large language models (LLMs) have achieved dramatic proficiency over NLP
tasks with normal length. Recently, multiple studies have committed to
extending the context length and enhancing the long text modeling capabilities
of LLMs. To comprehensively evaluate the long context ability of LLMs, we
propose BAMBOO, a multi-task long context benchmark. BAMBOO has been designed
with four principles: comprehensive capacity evaluation, avoidance of data
contamination, accurate automatic evaluation, and different length levels. It
consists of 10 datasets from 5 different long text understanding tasks, i.e.
question answering, hallucination detection, text sorting, language modeling,
and code completion, to cover core capacities and various domains of LLMs. We
conduct experiments with five long context models on BAMBOO and further discuss
four key research questions of long text. We also qualitatively analyze current
long context models and point out future directions for enhancing long text
modeling capacities. We release our data, prompts, and code at
https://github.com/RUCAIBox/BAMBOO
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Large language models (LLMs), such as ChatGPT, are prone to generate
hallucinations, i.e., content that conflicts with the source or cannot be
verified by the factual knowledge. To understand what types of content and to
which extent LLMs are apt to hallucinate, we introduce the Hallucination
Evaluation benchmark for Large Language Models (HaluEval), a large collection
of generated and human-annotated hallucinated samples for evaluating the
performance of LLMs in recognizing hallucination. To generate these samples, we
propose a ChatGPT-based two-step framework, i.e., sampling-then-filtering.
Besides, we also hire some human labelers to annotate the hallucinations in
ChatGPT responses. The empirical results suggest that ChatGPT is likely to
generate hallucinated content in specific topics by fabricating unverifiable
information (i.e., about responses). Moreover, existing LLMs face
great challenges in recognizing the hallucinations in texts. However, our
experiments also prove that providing external knowledge or adding reasoning
steps can help LLMs recognize hallucinations. Our benchmark can be accessed at
https://github.com/RUCAIBox/HaluEval.Comment: Accepted to EMNLP 2023 Main Conference (Long Paper
Learning to Imagine: Visually-Augmented Natural Language Generation
People often imagine relevant scenes to aid in the writing process. In this
work, we aim to utilize visual information for composition in the same manner
as humans. We propose a method, LIVE, that makes pre-trained language models
(PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
First, we imagine the scene based on the text: we use a diffusion model to
synthesize high-quality images conditioned on the input texts. Second, we use
CLIP to determine whether the text can evoke the imagination in a posterior
way. Finally, our imagination is dynamic, and we conduct synthesis for each
sentence rather than generate only one image for an entire paragraph.
Technically, we propose a novel plug-and-play fusion layer to obtain
visually-augmented representations for each text. Our vision-text fusion layer
is compatible with Transformerbased architecture. We have conducted extensive
experiments on four generation tasks using BART and T5, and the automatic
results and human evaluation demonstrate the effectiveness of our proposed
method. We will release the code, model, and data at the link:
https://github.com/RUCAIBox/LIVE.Comment: Accepted by ACL 202
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care
The COVID-19 pandemic has posed a heavy burden to the healthcare system
worldwide and caused huge social disruption and economic loss. Many deep
learning models have been proposed to conduct clinical predictive tasks such as
mortality prediction for COVID-19 patients in intensive care units using
Electronic Health Record (EHR) data. Despite their initial success in certain
clinical applications, there is currently a lack of benchmarking results to
achieve a fair comparison so that we can select the optimal model for clinical
use. Furthermore, there is a discrepancy between the formulation of traditional
prediction tasks and real-world clinical practice in intensive care. To fill
these gaps, we propose two clinical prediction tasks, Outcome-specific
length-of-stay prediction and Early mortality prediction for COVID-19 patients
in intensive care units. The two tasks are adapted from the naive
length-of-stay and mortality prediction tasks to accommodate the clinical
practice for COVID-19 patients. We propose fair, detailed, open-source
data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models
on two tasks, including 5 machine learning models, 6 basic deep learning models
and 6 deep learning predictive models specifically designed for EHR data. We
provide benchmarking results using data from two real-world COVID-19 EHR
datasets. One dataset is publicly available without needing any inquiry and
another dataset can be accessed on request. We provide fair, reproducible
benchmarking results for two tasks. We deploy all experiment results and models
on an online platform. We also allow clinicians and researchers to upload their
data to the platform and get quick prediction results using our trained models.
We hope our efforts can further facilitate deep learning and machine learning
research for COVID-19 predictive modeling.Comment: Junyi Gao, Yinghao Zhu and Wenqing Wang contributed equall
CAMP:Co-Attention Memory Networks for Diagnosis Prediction in Healthcare
Diagnosis prediction, which aims to predict future health information of patients from historical electronic health records (EHRs), is a core research task in personalized healthcare. Although some RNN-based methods have been proposed to model sequential EHR data, these methods have two major issues. First, they cannot capture fine-grained progression patterns of patient health conditions. Second, they do not consider the mutual effect between important context (e.g., patient demographics) and historical diagnosis. To tackle these challenges, we propose a model called Co-Attention Memory networks for diagnosis Prediction (CAMP), which tightly integrates historical records, fine-grained patient conditions, and demographics with a three-way interaction architecture built on co-attention. Our model augments RNNs with a memory network to enrich the representation capacity. The memory network enables analysis of fine-grained patient conditions by explicitly incorporating a taxonomy of diseases into an array of memory slots. We instantiate the READ/WRITE operations of the memory network so that the memory cooperates effectively with the patient demographics through co-attention mechanism. Experiments on real-world datasets demonstrate that CAMP consistently performs better than state-of-the-art methods
Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network
Personalized review generation (PRG) aims to automatically produce review
text reflecting user preference, which is a challenging natural language
generation task. Most of previous studies do not explicitly model factual
description of products, tending to generate uninformative content. Moreover,
they mainly focus on word-level generation, but cannot accurately reflect more
abstractive user preference in multiple aspects. To address the above issues,
we propose a novel knowledge-enhanced PRG model based on capsule graph neural
network~(Caps-GNN). We first construct a heterogeneous knowledge graph (HKG)
for utilizing rich item attributes. We adopt Caps-GNN to learn graph capsules
for encoding underlying characteristics from the HKG. Our generation process
contains two major steps, namely aspect sequence generation and sentence
generation. First, based on graph capsules, we adaptively learn aspect capsules
for inferring the aspect sequence. Then, conditioned on the inferred aspect
label, we design a graph-based copy mechanism to generate sentences by
incorporating related entities or words from HKG. To our knowledge, we are the
first to utilize knowledge graph for the PRG task. The incorporated KG
information is able to enhance user preference at both aspect and word levels.
Extensive experiments on three real-world datasets have demonstrated the
effectiveness of our model on the PRG task.Comment: Accepted by CIKM 2020 (Long Paper
- …