6 research outputs found
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
This paper tackles the problem of reading comprehension over long narratives
where documents easily span over thousands of tokens. We propose a curriculum
learning (CL) based Pointer-Generator framework for reading/sampling over large
documents, enabling diverse training of the neural model based on the notion of
alternating contextual difficulty. This can be interpreted as a form of domain
randomization and/or generative pretraining during training. To this end, the
usage of the Pointer-Generator softens the requirement of having the answer
within the context, enabling us to construct diverse training samples for
learning. Additionally, we propose a new Introspective Alignment Layer (IAL),
which reasons over decomposed alignments using block-based self-attention. We
evaluate our proposed method on the NarrativeQA reading comprehension
benchmark, achieving state-of-the-art performance, improving existing baselines
by relative improvement on BLEU-4 and relative improvement on
Rouge-L. Extensive ablations confirm the effectiveness of our proposed IAL and
CL components.Comment: Accepted to ACL 201
Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation
Current state-of-the-art neural dialogue systems are mainly data-driven and
are trained on human-generated responses. However, due to the subjectivity and
open-ended nature of human conversations, the complexity of training dialogues
varies greatly. The noise and uneven complexity of query-response pairs impede
the learning efficiency and effects of the neural dialogue generation models.
What is more, so far, there are no unified dialogue complexity measurements,
and the dialogue complexity embodies multiple aspects of
attributes---specificity, repetitiveness, relevance, etc. Inspired by human
behaviors of learning to converse, where children learn from easy dialogues to
complex ones and dynamically adjust their learning progress, in this paper, we
first analyze five dialogue attributes to measure the dialogue complexity in
multiple perspectives on three publicly available corpora. Then, we propose an
adaptive multi-curricula learning framework to schedule a committee of the
organized curricula. The framework is established upon the reinforcement
learning paradigm, which automatically chooses different curricula at the
evolving learning process according to the learning status of the neural
dialogue generation model. Extensive experiments conducted on five
state-of-the-art models demonstrate its learning efficiency and effectiveness
with respect to 13 automatic evaluation metrics and human judgments.Comment: Accepted to AAAI 202
Simple Model Also Works: A Novel Emotion Recognition Network in Textual Conversation Based on Curriculum Learning Strategy
Emotion Recognition in Conversation (ERC) has emerged as a research hotspot
in domains such as conversational robots and question-answer systems. How to
efficiently and adequately retrieve contextual emotional cues has been one of
the key challenges in the ERC task. Existing efforts do not fully model the
context and employ complex network structures, resulting in excessive
computational resource overhead without substantial performance improvement. In
this paper, we propose a novel Emotion Recognition Network based on Curriculum
Learning strategy (ERNetCL). The proposed ERNetCL primarily consists of
Temporal Encoder (TE), Spatial Encoder (SE), and Curriculum Learning (CL) loss.
We utilize TE and SE to combine the strengths of previous methods in a
simplistic manner to efficiently capture temporal and spatial contextual
information in the conversation. To simulate the way humans learn curriculum
from easy to hard, we apply the idea of CL to the ERC task to progressively
optimize the network parameters of ERNetCL. At the beginning of training, we
assign lower learning weights to difficult samples. As the epoch increases, the
learning weights for these samples are gradually raised. Extensive experiments
on four datasets exhibit that our proposed method is effective and dramatically
beats other baseline models.Comment: 12 pages,9 figure
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
Visual Grounding (VG) is a crucial topic in the field of vision and language,
which involves locating a specific region described by expressions within an
image. To reduce the reliance on manually labeled data, unsupervised methods
have been developed to locate regions using pseudo-labels. However, the
performance of existing unsupervised methods is highly dependent on the quality
of pseudo-labels and these methods always encounter issues with limited
diversity. In order to utilize vision and language pre-trained models to
address the grounding problem, and reasonably take advantage of pseudo-labels,
we propose CLIP-VG, a novel method that can conduct self-paced curriculum
adapting of CLIP with pseudo-language labels. We propose a simple yet efficient
end-to-end network architecture to realize the transfer of CLIP to the visual
grounding. Based on the CLIP-based architecture, we further propose
single-source and multi-source curriculum adapting algorithms, which can
progressively find more reliable pseudo-labels to learn an optimal model,
thereby achieving a balance between reliability and diversity for the
pseudo-language labels. Our method outperforms the current state-of-the-art
unsupervised method by a significant margin on RefCOCO/+/g datasets in both
single-source and multi-source scenarios, with improvements ranging from 6.78%
to 10.67% and 11.39% to 14.87%, respectively. Furthermore, our approach even
outperforms existing weakly supervised methods. The code and models are
available at https://github.com/linhuixiao/CLIP-VG.Comment: Accepted by IEEE Transaction on Multimedia (2023), Paper page:
https://ieeexplore.ieee.org/abstract/document/10269126. Code will be released
at https://github.com/linhuixiao/CLIP-V