3 research outputs found
Counterfactual Learning from Human Proofreading Feedback for Semantic Parsing
In semantic parsing for question-answering, it is often too expensive to
collect gold parses or even gold answers as supervision signals. We propose to
convert model outputs into a set of human-understandable statements which allow
non-expert users to act as proofreaders, providing error markings as learning
signals to the parser. Because model outputs were suggested by a historic
system, we operate in a counterfactual, or off-policy, learning setup. We
introduce new estimators which can effectively leverage the given feedback and
which avoid known degeneracies in counterfactual learning, while still being
applicable to stochastic gradient optimization for neural semantic parsing.
Furthermore, we discuss how our feedback collection method can be seamlessly
integrated into deployed virtual personal assistants that embed a semantic
parser. Our work is the first to show that semantic parsers can be improved
significantly by counterfactual learning from logged human feedback data.Comment: "Learning by Instruction" Workshop at the 32nd Conference on Neural
Information Processing Systems (NIPS 2018), Montr\'eal, Canada. arXiv admin
note: substantial text overlap with arXiv:1805.0125
Adaptive Summaries: A Personalized Concept-based Summarization Approach by Learning from Users' Feedback
Exploring the tremendous amount of data efficiently to make a decision,
similar to answering a complicated question, is challenging with many
real-world application scenarios. In this context, automatic summarization has
substantial importance as it will provide the foundation for big data analytic.
Traditional summarization approaches optimize the system to produce a short
static summary that fits all users that do not consider the subjectivity aspect
of summarization, i.e., what is deemed valuable for different users, making
these approaches impractical in real-world use cases. This paper proposes an
interactive concept-based summarization model, called Adaptive Summaries, that
helps users make their desired summary instead of producing a single inflexible
summary. The system learns from users' provided information gradually while
interacting with the system by giving feedback in an iterative loop. Users can
choose either reject or accept action for selecting a concept being included in
the summary with the importance of that concept from users' perspectives and
confidence level of their feedback. The proposed approach can guarantee
interactive speed to keep the user engaged in the process. Furthermore, it
eliminates the need for reference summaries, which is a challenging issue for
summarization tasks. Evaluations show that Adaptive Summaries helps users make
high-quality summaries based on their preferences by maximizing the
user-desired content in the generated summaries
Preference-based Interactive Multi-Document Summarisation
Interactive NLP is a promising paradigm to close the gap between automatic
NLP systems and the human upper bound. Preference-based interactive learning
has been successfully applied, but the existing methods require several
thousand interaction rounds even in simulations with perfect user feedback. In
this paper, we study preference-based interactive summarisation. To reduce the
number of interaction rounds, we propose the Active Preference-based
ReInforcement Learning (APRIL) framework. APRIL uses Active Learning to query
the user, Preference Learning to learn a summary ranking function from the
preferences, and neural Reinforcement Learning to efficiently search for the
(near-)optimal summary. Our results show that users can easily provide reliable
preferences over summaries and that APRIL outperforms the state-of-the-art
preference-based interactive method in both simulation and real-user
experiments.Comment: Submitted to the special issue on "Learning from User Interactions",
Information Retrieval Journa