42 research outputs found
Improving Query-Focused Meeting Summarization with Query-Relevant Knowledge
Query-Focused Meeting Summarization (QFMS) aims to generate a summary of a
given meeting transcript conditioned upon a query. The main challenges for QFMS
are the long input text length and sparse query-relevant information in the
meeting transcript. In this paper, we propose a knowledge-enhanced two-stage
framework called Knowledge-Aware Summarizer (KAS) to tackle the challenges. In
the first stage, we introduce knowledge-aware scores to improve the
query-relevant segment extraction. In the second stage, we incorporate
query-relevant knowledge in the summary generation. Experimental results on the
QMSum dataset show that our approach achieves state-of-the-art performance.
Further analysis proves the competency of our methods in generating relevant
and faithful summaries.Comment: AACL 2023 Finding
Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction
Instruction-tuned large language models (LLMs) have shown remarkable
generalization capability over multiple tasks in multiple languages.
Nevertheless, their generalization towards different languages varies
especially to underrepresented languages or even to unseen languages. Prior
works on adapting new languages to LLMs find that naively adapting new
languages to instruction-tuned LLMs will result in catastrophic forgetting,
which in turn causes the loss of multitasking ability in these LLMs. To tackle
this, we propose the Instruct-Align a.k.a (IA) framework, which enables
instruction-tuned LLMs to learn cross-lingual alignment between unseen and
previously learned languages via alignment-based cross-lingual
instruction-tuning. Our preliminary result on BLOOMZ-560M shows that (IA)
is able to learn a new language effectively with only a limited amount of
parallel data and at the same time prevent catastrophic forgetting by applying
continual instruction-tuning through experience replay. Our work contributes to
the progression of language adaptation methods for instruction-tuned LLMs and
opens up the possibility of adapting underrepresented low-resource languages
into existing instruction-tuned LLMs. Our code will be publicly released upon
acceptance
Towards Mitigating Hallucination in Large Language Models via Self-Reflection
Large language models (LLMs) have shown promise for generative and
knowledge-intensive tasks including question-answering (QA) tasks. However, the
practical deployment still faces challenges, notably the issue of
"hallucination", where models generate plausible-sounding but unfaithful or
nonsensical information. This issue becomes particularly critical in the
medical domain due to the uncommon professional concepts and potential social
risks involved. This paper analyses the phenomenon of hallucination in medical
generative QA systems using widely adopted LLMs and datasets. Our investigation
centers on the identification and comprehension of common problematic answers,
with a specific emphasis on hallucination. To tackle this challenge, we present
an interactive self-reflection methodology that incorporates knowledge
acquisition and answer generation. Through this feedback process, our approach
steadily enhances the factuality, consistency, and entailment of the generated
answers. Consequently, we harness the interactivity and multitasking ability of
LLMs and produce progressively more precise and accurate answers. Experimental
results on both automatic and human evaluation demonstrate the superiority of
our approach in hallucination reduction compared to baselines.Comment: Accepted by the findings of EMNLP 202
AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation
Advertising posters, a form of information presentation, combine visual and
linguistic modalities. Creating a poster involves multiple steps and
necessitates design experience and creativity. This paper introduces
AutoPoster, a highly automatic and content-aware system for generating
advertising posters. With only product images and titles as inputs, AutoPoster
can automatically produce posters of varying sizes through four key stages:
image cleaning and retargeting, layout generation, tagline generation, and
style attribute prediction. To ensure visual harmony of posters, two
content-aware models are incorporated for layout and tagline generation.
Moreover, we propose a novel multi-task Style Attribute Predictor (SAP) to
jointly predict visual style attributes. Meanwhile, to our knowledge, we
propose the first poster generation dataset that includes visual attribute
annotations for over 76k posters. Qualitative and quantitative outcomes from
user studies and experiments substantiate the efficacy of our system and the
aesthetic superiority of the generated posters compared to other poster
generation methods.Comment: Accepted for ACM MM 202
CrossNER: Evaluating Cross-Domain Named Entity Recognition
Cross-domain named entity recognition (NER) models are able to cope with the
scarcity issue of NER samples in target domains. However, most of the existing
NER benchmarks lack domain-specialized entity types or do not focus on a
certain domain, leading to a less effective cross-domain evaluation. To address
these obstacles, we introduce a cross-domain NER dataset (CrossNER), a
fully-labeled collection of NER data spanning over five diverse domains with
specialized entity categories for different domains. Additionally, we also
provide a domain-related corpus since using it to continue pre-training
language models (domain-adaptive pre-training) is effective for the domain
adaptation. We then conduct comprehensive experiments to explore the
effectiveness of leveraging different levels of the domain corpus and
pre-training strategies to do domain-adaptive pre-training for the cross-domain
task. Results show that focusing on the fractional corpus containing
domain-specialized entities and utilizing a more challenging pre-training
strategy in domain-adaptive pre-training are beneficial for the NER domain
adaptation, and our proposed method can consistently outperform existing
cross-domain NER baselines. Nevertheless, experiments also illustrate the
challenge of this cross-domain NER task. We hope that our dataset and baselines
will catalyze research in the NER domain adaptation area. The code and data are
available at https://github.com/zliucr/CrossNER.Comment: Accepted in AAAI-202
MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
In addressing the challenge of interpretability and generalizability of
artificial music intelligence, this paper introduces a novel symbolic
representation that amalgamates both explicit and implicit musical information
across diverse traditions and granularities. Utilizing a hierarchical and-or
graph representation, the model employs nodes and edges to encapsulate a broad
spectrum of musical elements, including structures, textures, rhythms, and
harmonies. This hierarchical approach expands the representability across
various scales of music. This representation serves as the foundation for an
energy-based model, uniquely tailored to learn musical concepts through a
flexible algorithm framework relying on the minimax entropy principle.
Utilizing an adapted Metropolis-Hastings sampling technique, the model enables
fine-grained control over music generation. A comprehensive empirical
evaluation, contrasting this novel approach with existing methodologies,
manifests considerable advancements in interpretability and controllability.
This study marks a substantial contribution to the fields of music analysis,
composition, and computational musicology
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
This paper proposes a framework for quantitatively evaluating interactive
LLMs such as ChatGPT using publicly available data sets. We carry out an
extensive technical evaluation of ChatGPT using 23 data sets covering 8
different common NLP application tasks. We evaluate the multitask, multilingual
and multi-modal aspects of ChatGPT based on these data sets and a newly
designed multimodal dataset. We find that ChatGPT outperforms LLMs with
zero-shot learning on most tasks and even outperforms fine-tuned models on some
tasks. We find that it is better at understanding non-Latin script languages
than generating them. It is able to generate multimodal content from textual
prompts, via an intermediate code generation step. Moreover, we find that
ChatGPT is 63.41% accurate on average in 10 different reasoning categories
under logical reasoning, non-textual reasoning, and commonsense reasoning,
hence making it an unreliable reasoner. It is, for example, better at deductive
than inductive reasoning. ChatGPT suffers from hallucination problems like
other LLMs and it generates more extrinsic hallucinations from its parametric
memory as it does not have access to an external knowledge base. Finally, the
interactive feature of ChatGPT enables human collaboration with the underlying
LLM to improve its performance, i.e, 8% ROUGE-1 on summarization and 2% ChrF++
on machine translation, in a multi-turn "prompt engineering" fashion. We also
release codebase for evaluation set extraction.Comment: 45 pages, AACL 202