576 research outputs found
Probabilistic Linguistic Knowledge and Token-level Text Augmentation
This paper investigates the effectiveness of token-level text augmentation
and the role of probabilistic linguistic knowledge within a
linguistically-motivated evaluation context. Two text augmentation programs,
REDA and REDA, were developed, both implementing five token-level text
editing operations: Synonym Replacement (SR), Random Swap (RS), Random
Insertion (RI), Random Deletion (RD), and Random Mix (RM). REDA
leverages pretrained -gram language models to select the most likely
augmented texts from REDA's output. Comprehensive and fine-grained experiments
were conducted on a binary question matching classification task in both
Chinese and English. The results strongly refute the general effectiveness of
the five token-level text augmentation techniques under investigation, whether
applied together or separately, and irrespective of various common
classification model types used, including transformers. Furthermore, the role
of probabilistic linguistic knowledge is found to be minimal.Comment: 20 pages; 3 figures; 8 table
Consistency Analysis of ChatGPT
ChatGPT, a question-and-answer dialogue system based on a large language
model, has gained huge popularity since its introduction. Its positive aspects
have been reported through many media platforms, and some analyses even showed
that ChatGPT achieved a decent grade in professional exams, including the law,
medical, and finance domains, adding extra support to the claim that AI now can
assist and, even, replace humans in industrial fields. Others, however, doubt
its reliability and trustworthiness. In this paper, we investigate ChatGPT's
trustworthiness regarding logically consistent behaviours. Our findings suggest
that, although ChatGPT seems to achieve an improved language understanding
ability, it still fails to generate logically correct predictions frequently.
Hence, while it is true that ChatGPT is an impressive and promising new
technique, we conclude that its usage in real-world applications without
thorough human inspection requires further consideration, especially for
risk-sensitive areas.Comment: 11 page
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Visual question answering requires a system to provide an accurate natural
language answer given an image and a natural language question. However, it is
widely recognized that previous generic VQA methods often exhibit a tendency to
memorize biases present in the training data rather than learning proper
behaviors, such as grounding images before predicting answers. Therefore, these
methods usually achieve high in-distribution but poor out-of-distribution
performance. In recent years, various datasets and debiasing methods have been
proposed to evaluate and enhance the VQA robustness, respectively. This paper
provides the first comprehensive survey focused on this emerging fashion.
Specifically, we first provide an overview of the development process of
datasets from in-distribution and out-of-distribution perspectives. Then, we
examine the evaluation metrics employed by these datasets. Thirdly, we propose
a typology that presents the development process, similarities and differences,
robustness comparison, and technical features of existing debiasing methods.
Furthermore, we analyze and discuss the robustness of representative
vision-and-language pre-training models on VQA. Finally, through a thorough
review of the available literature and experimental analysis, we discuss the
key areas for future research from various viewpoints.Comment: IEEE TPAMI (Under Review
- …