5 research outputs found
SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection
Multi-modal intent detection aims to utilize various modalities to understand
the user's intentions, which is essential for the deployment of dialogue
systems in real-world scenarios. The two core challenges for multi-modal intent
detection are (1) how to effectively align and fuse different features of
modalities and (2) the limited labeled multi-modal intent training data. In
this work, we introduce a shallow-to-deep interaction framework with data
augmentation (SDIF-DA) to address the above challenges. Firstly, SDIF-DA
leverages a shallow-to-deep interaction module to progressively and effectively
align and fuse features across text, video, and audio modalities. Secondly, we
propose a ChatGPT-based data augmentation approach to automatically augment
sufficient training data. Experimental results demonstrate that SDIF-DA can
effectively align and fuse multi-modal features by achieving state-of-the-art
performance. In addition, extensive analyses show that the introduced data
augmentation approach can successfully distill knowledge from the large
language model.Comment: Accepted by ICASSP 202
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
Developing Large Language Models (LLMs) with robust long-context capabilities
has been the recent research focus, resulting in the emergence of long-context
LLMs proficient in Chinese. However, the evaluation of these models remains
underdeveloped due to a lack of benchmarks. To address this gap, we present
CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs.
CLongEval is characterized by three key features: (1) Sufficient data volume,
comprising 7 distinct tasks and 7,267 examples; (2) Broad applicability,
accommodating to models with context windows size from 1K to 100K; (3) High
quality, with over 2,000 manually annotated question-answer pairs in addition
to the automatically constructed labels. With CLongEval, we undertake a
comprehensive assessment of 6 open-source long-context LLMs and 2 leading
commercial counterparts that feature both long-context abilities and
proficiency in Chinese. We also provide in-depth analysis based on the
empirical results, trying to shed light on the critical capabilities that
present challenges in long-context settings. The dataset, evaluation scripts,
and model outputs will be released.Comment: 19 pages, 4 figure
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate
reasoning paths, thus promoting reasoning accuracy and attracting increasing
attention. Specifically, zero-shot CoT achieves remarkable improvements in a
wide range of reasoning tasks by simply instructing the LLM with the prompt
"Let's think step by step!". Despite the success of zero-shot CoT, the existing
zero-shot prompting techniques remain limited to a single language, making it
challenging to generalize to other languages and hindering global development.
In this work, we introduce cross-lingual prompting (CLP), aiming to improve
zero-shot CoT reasoning across languages. Specifically, CLP consists of two
main components: (1) cross-lingual alignment prompting and (2) task-specific
solver prompting. The cross-lingual alignment prompting is responsible for
aligning representations across different languages, whereas the task-specific
solver prompting is used to generate the final chain of thoughts and results
for the reasoning task. In addition, we further introduce cross-lingual
self-consistent prompting (CLSP) to ensemble different reasoning paths across
languages. Our experimental evaluations on several benchmarks demonstrate that
CLP and CLSP significantly outperform the existing prompting methods and
achieve state-of-the-art performance. We hope this work will inspire further
breakthroughs in cross-lingual CoT.Comment: Accepted at EMNLP2023 Main Conferenc
Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Few-shot and zero-shot entity linking focus on the tail and emerging
entities, which are more challenging but closer to real-world scenarios. The
mainstream method is the ''retrieve and rerank'' two-stage framework. In this
paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity
candidates in an effective manner, which operates in two layers. The first
layer retrieves coarse-grained candidates by leveraging entity names, while the
second layer narrows down the search to fine-grained candidates within the
coarse-grained ones. In addition, this second layer utilizes entity
descriptions to effectively disambiguate tail or new entities that share names
with existing popular entities. Experimental results indicate that our approach
can obtain superior performance without requiring extensive finetuning in the
retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task
6 on Chinese Few-shot and Zero-shot Entity Linking.Comment: Accepted to NLPCC202
MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Multi-modal sarcasm detection has attracted much recent attention.
Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder
the development of reliable multi-modal sarcasm detection system: (1) There are
some spurious cues in MMSD, leading to the model bias learning; (2) The
negative samples in MMSD are not always reasonable. To solve the aforementioned
issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings
of MMSD, by removing the spurious cues and re-annotating the unreasonable
samples. Meanwhile, we present a novel framework called multi-view CLIP that is
capable of leveraging multi-grained cues from multiple perspectives (i.e.,
text, image, and text-image interaction view) for multi-modal sarcasm
detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for
building reliable multi-modal sarcasm detection systems and multi-view CLIP can
significantly outperform the previous best baselines.Comment: Accepted by ACL2023 Finding