6 research outputs found
Multi-view Semantic Matching of Question retrieval using Fine-grained Semantic Representations
As a key task of question answering, question retrieval has attracted much
attention from the communities of academia and industry. Previous solutions
mainly focus on the translation model, topic model, and deep learning
techniques. Distinct from the previous solutions, we propose to construct
fine-grained semantic representations of a question by a learned importance
score assigned to each keyword, so that we can achieve a fine-grained question
matching solution with these semantic representations of different lengths.
Accordingly, we propose a multi-view semantic matching model by reusing the
important keywords in multiple semantic representations.
As a key of constructing fine-grained semantic representations, we are the
first to use a cross-task weakly supervised extraction model that applies
question-question labelled signals to supervise the keyword extraction process
(i.e. to learn the keyword importance). The extraction model integrates the
deep semantic representation and lexical matching information with statistical
features to estimate the importance of keywords. We conduct extensive
experiments on three public datasets and the experimental results show that our
proposed model significantly outperforms the state-of-the-art solutions.Comment: 10 page
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
Large language models with instruction-following capabilities open the door
to a wider group of users. However, when it comes to information extraction - a
classic task in natural language processing - most task-specific systems cannot
align well with long-tail ad hoc extraction use cases for non-expert users. To
address this, we propose a novel paradigm, termed On-Demand Information
Extraction, to fulfill the personalized demands of real-world users. Our task
aims to follow the instructions to extract the desired content from the
associated text and present it in a structured tabular format. The table
headers can either be user-specified or inferred contextually by the model. To
facilitate research in this emerging area, we present a benchmark named
InstructIE, inclusive of both automatically generated training data, as well as
the human-annotated test set. Building on InstructIE, we further develop an
On-Demand Information Extractor, ODIE. Comprehensive evaluations on our
benchmark reveal that ODIE substantially outperforms the existing open-source
models of similar size. Our code and dataset are released on
https://github.com/yzjiao/On-Demand-IE.Comment: EMNLP 202
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages
Instruction tuned large language models (LLMs), such as ChatGPT, demonstrate
remarkable performance in a wide range of tasks. Despite numerous recent
studies that examine the performance of instruction-tuned LLMs on various NLP
benchmarks, there remains a lack of comprehensive investigation into their
ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning
embedded within social and interactive contexts. This deficiency arises partly
from SM not being adequately represented in any of the existing benchmarks. To
address this gap, we present SPARROW, an extensive multilingual benchmark
specifically designed for SM understanding. SPARROW comprises 169 datasets
covering 13 task types across six primary categories (e.g., anti-social
language detection, emotion recognition). SPARROW datasets encompass 64
different languages originating from 12 language families representing 16
writing scripts. We evaluate the performance of various multilingual pretrained
language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT)
on SPARROW through fine-tuning, zero-shot, and/or few-shot learning. Our
comprehensive analysis reveals that existing open-source instruction tuned LLMs
still struggle to understand SM across various languages, performing close to a
random baseline in some cases. We also find that although ChatGPT outperforms
many LLMs, it still falls behind task-specific finetuned models with a gap of
12.19 SPARROW score. Our benchmark is available at:
https://github.com/UBC-NLP/SPARROWComment: Accepted by EMNLP 2023 Main conferenc
Recent Advances in Social Data and Artificial Intelligence 2019
The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace