Search CORE

17 research outputs found

Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses

Author: Cheung Jackie Chi Kit
Dong Yue
Grenander Matt
Louis Annie
Publication venue
Publication date: 01/01/2019
Field of study

Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article. In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose two techniques to make systems sensitive to the importance of content in different parts of the article. The first technique employs 'unbiased' data; i.e., randomly shuffled sentences of the source document, to pretrain the model. The second technique uses an auxiliary ROUGE-based loss that encourages the model to distribute importance scores throughout a document by mimicking sentence-level ROUGE scores on the training data. We show that these techniques significantly improve the performance of a competitive reinforcement learning based extractive system, with the auxiliary loss being more powerful than pretraining.Comment: 5 pages, accepted at EMNLP 201

arXiv.org e-Print Archive

Crossref

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Author: Gao Chao
Hernandez-Leal Pablo
Kartal Bilal
Taylor Matthew E.
Publication venue
Publication date: 26/07/2019
Field of study

How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications