2 research outputs found
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages
Instruction tuned large language models (LLMs), such as ChatGPT, demonstrate
remarkable performance in a wide range of tasks. Despite numerous recent
studies that examine the performance of instruction-tuned LLMs on various NLP
benchmarks, there remains a lack of comprehensive investigation into their
ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning
embedded within social and interactive contexts. This deficiency arises partly
from SM not being adequately represented in any of the existing benchmarks. To
address this gap, we present SPARROW, an extensive multilingual benchmark
specifically designed for SM understanding. SPARROW comprises 169 datasets
covering 13 task types across six primary categories (e.g., anti-social
language detection, emotion recognition). SPARROW datasets encompass 64
different languages originating from 12 language families representing 16
writing scripts. We evaluate the performance of various multilingual pretrained
language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT)
on SPARROW through fine-tuning, zero-shot, and/or few-shot learning. Our
comprehensive analysis reveals that existing open-source instruction tuned LLMs
still struggle to understand SM across various languages, performing close to a
random baseline in some cases. We also find that although ChatGPT outperforms
many LLMs, it still falls behind task-specific finetuned models with a gap of
12.19 SPARROW score. Our benchmark is available at:
https://github.com/UBC-NLP/SPARROWComment: Accepted by EMNLP 2023 Main conferenc
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust
Large language models (LLMs) have significantly advanced the field of natural
language processing, with GPT models at the forefront. While their remarkable
performance spans a range of tasks, adapting LLMs for real-world business
scenarios still poses challenges warranting further investigation. This paper
presents an empirical analysis aimed at bridging the gap in adapting LLMs to
practical use cases. To do that, we select the question answering (QA) task of
insurance as a case study due to its challenge of reasoning. Based on the task
we design a new model relied on LLMs which are empowered by additional
knowledge extracted from insurance policy rulebooks and DBpedia. The additional
knowledge helps LLMs to understand new concepts of insurance for domain
adaptation. Preliminary results on two QA datasets show that knowledge
enhancement significantly improves the reasoning ability of GPT-3.5 (55.80% and
57.83% in terms of accuracy). The analysis also indicates that existing public
knowledge bases, e.g., DBPedia is beneficial for knowledge enhancement. Our
findings reveal that the inherent complexity of business scenarios often
necessitates the incorporation of domain-specific knowledge and external
resources for effective problem-solving.Comment: Ongoing work to adapt LLMs for business scenario