10 research outputs found
The TechQA Dataset
We introduce TechQA, a domain-adaptation question answering dataset for the
technical support domain. The TechQA corpus highlights two real-world issues
from the automated customer support domain. First, it contains actual questions
posed by users on a technical forum, rather than questions generated
specifically for a competition or a task. Second, it has a real-world size --
600 training, 310 dev, and 490 evaluation question/answer pairs -- thus
reflecting the cost of creating large labeled datasets with actual data.
Consequently, TechQA is meant to stimulate research in domain adaptation rather
than being a resource to build QA systems from scratch. The dataset was
obtained by crawling the IBM Developer and IBM DeveloperWorks forums for
questions with accepted answers that appear in a published IBM Technote---a
technical document that addresses a specific technical issue. We also release a
collection of the 801,998 publicly available Technotes as of April 4, 2019 as a
companion resource that might be used for pretraining, to learn representations
of the IT domain language.Comment: Long version of conference paper to be submitte
: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy
As the demand for sophisticated Natural Language Processing (NLP) models
continues to grow, so does the need for efficient pre-training techniques.
Current NLP models undergo resource-intensive pre-training. In response, we
introduce (Fast Pre-training Technique using Document-Level Metadata
and Taxonomy), a novel approach designed to significantly reduce computational
demands. leverages document metadata and domain-specific taxonomy as
supervision signals. It involves continual pre-training of an open-domain
transformer encoder using sentence-level embeddings, followed by fine-tuning
using token-level embeddings. We evaluate on six tasks across nine
datasets spanning three distinct domains. Remarkably, achieves
remarkable compute reductions of approximately 1,000x, 4,500x, 500x compared to
competitive approaches in Customer Support, Scientific, and Legal domains,
respectively. Importantly, these efficiency gains do not compromise performance
relative to competitive baselines. Furthermore, reduced pre-training data
mitigates catastrophic forgetting, ensuring consistent performance in
open-domain scenarios. offers a promising solution for
resource-efficient pre-training, with potential applications spanning various
domains.Comment: 38 pages, 7 figure
Improving Low-Resource Question Answering using Active Learning in Multiple Stages
Neural approaches have become very popular in the domain of Question
Answering, however they require a large amount of annotated data. Furthermore,
they often yield very good performance but only in the domain they were trained
on. In this work we propose a novel approach that combines data augmentation
via question-answer generation with Active Learning to improve performance in
low resource settings, where the target domains are diverse in terms of
difficulty and similarity to the source domain. We also investigate Active
Learning for question answering in different stages, overall reducing the
annotation effort of humans. For this purpose, we consider target domains in
realistic settings, with an extremely low amount of annotated samples but with
many unlabeled documents, which we assume can be obtained with little effort.
Additionally, we assume sufficient amount of labeled data from the source
domain is available. We perform extensive experiments to find the best setup
for incorporating domain experts. Our findings show that our novel approach,
where humans are incorporated as early as possible in the process, boosts
performance in the low-resource, domain-specific setting, allowing for
low-labeling-effort question answering systems in new, specialized domains.
They further demonstrate how human annotation affects the performance of QA
depending on the stage it is performed.Comment: 16 pages, 8 figure
Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges
Large Language Models (LLMs) have demonstrated impressive zero shot
performance on a wide range of NLP tasks, demonstrating the ability to reason
and apply commonsense. A relevant application is to use them for creating high
quality synthetic datasets for downstream tasks. In this work, we probe whether
GPT-4 can be used to augment existing extractive reading comprehension
datasets. Automating data annotation processes has the potential to save large
amounts of time, money and effort that goes into manually labelling datasets.
In this paper, we evaluate the performance of GPT-4 as a replacement for human
annotators for low resource reading comprehension tasks, by comparing
performance after fine tuning, and the cost associated with annotation. This
work serves to be the first analysis of LLMs as synthetic data augmenters for
QA systems, highlighting the unique opportunities and challenges. Additionally,
we release augmented versions of low resource datasets, that will allow the
research community to create further benchmarks for evaluation of generated
datasets.Comment: 5 pages, 1 figure, 3 table
MPMQA: Multimodal Question Answering on Product Manuals
Visual contents, such as illustrations and images, play a big role in product
manual understanding. Existing Product Manual Question Answering (PMQA)
datasets tend to ignore visual contents and only retain textual parts. In this
work, to emphasize the importance of multimodal contents, we propose a
Multimodal Product Manual Question Answering (MPMQA) task. For each question,
MPMQA requires the model not only to process multimodal contents but also to
provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is
constructed with human annotations, which contains 209 product manuals from 27
well-known consumer electronic brands. Human annotations include 6 types of
semantic regions for manual contents and 22,021 pairs of question and answer.
Especially, each answer consists of a textual sentence and related visual
regions from manuals. Taking into account the length of product manuals and the
fact that a question is always related to a small number of pages, MPMQA can be
naturally split into two subtasks: retrieving most related pages and then
generating multimodal answers. We further propose a unified model that can
perform these two subtasks all together and achieve comparable performance with
multiple task-specific models. The PM209 dataset is available at
https://github.com/AIM3-RUC/MPMQA
Exploring the Utility of Dutch Question Answering Datasets for Human Resource Contact Centres
We explore the use case of question answering (QA) by a contact centre for 130,000 Dutch government employees in the domain of questions about human resources (HR). HR questions can be answered using personnel files or general documentation, with the latter being the focus of the current research. We created a Dutch HR QA dataset with over 300 questions in the format of the Squad 2.0 dataset, which distinguishes between answerable and unanswerable questions. We applied various BERT-based models, either directly or after finetuning on the new dataset. The F1-scores reached 0.47 for unanswerable questions and 1.0 for answerable questions depending on the topic; however, large variations in scores were observed. We conclude more data are needed to further improve the performance of this task
Leveraging Feedback in Conversational Question Answering Systems
172 p.Tesi honen helburua martxan jarri eta geroko sistemek gizakiekin duten elkarregina erabiltzeada, gizakien feedbacka sistementzako ikasketa eta egokitzapen seinale bezala erabiliz.Elkarrizketa sistemek martxan jartzerakoan jasaten duten domeinu aldaketan jartzen dugufokua. Helburu honetarako, feedback bitar esplizituaren kasua aztertzen dugu, hau baitagizakientzat feedbacka emateko seinale errazena.Sistemak martxan jarri eta gero hobetzeko, lehenik eta behin DoQA izeneko galdera-erantzunmotako elkarriketez osatutako datu multzo bat eraiki dugu. Datu multzo honekcrowdsourcing bidez jasotako 2.437 dialogo ditu. Aurreko lanekin konparatuz gero, DoQAkbenetazko informazio beharrak islatzen ditu, datu multzo barneko elkarrizketak naturalagoaketa koherenteagoak izanik. Datu multzo sortu eta gero, feedback-weighted learning (FWL)izeneko algoritmo bat diseinatu dugu, feedback bitarra bakarrik erabiliz aurretikentrenatutako sistema gainbegiratu bat hobetzeko gai dena. Azkenik, algoritmo honen mugakaztertzen ditugu jasotako feedbacka zaratatsua den kasuetarako eta FWL moldatzen dugueszenatoki zaratsuari aurre egiteko. Kasu honetan lortzen ditugun emaitza negatiboakerakusten dute erabiltzaileetatik jasotako feedback zaratsua modelatzearen erronka, hauebaztea oraindik ikerkuntza galdera ireki bat delarik
Recommended from our members
Mitigating Data Scarcity for Neural Language Models
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm, achieving new benchmarks and state-of-theart performances. These models often rely heavily on annotated data, which may not always be available. Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research. In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques for neural language models. In both research directions, we implement neural network algorithms and evaluate their impact on assisting neural language models in downstream NLP tasks. Specifically, for data augmentation, we explore two techniques: 1) creating positive training data by moving an answer span around its original context and 2) using text simplification techniques to introduce a variety of writing styles to the original training data. Our results indicate that these simple and effective solutions improve the performance of neural language models considerably in low-resource NLP domains and tasks. For neural ensemble learning, we use a multi-label neural classifier to select the best prediction outcome from a variety of individual pretrained neural language models trained for a low-resource medical text simplification task