5 research outputs found
Bootstrapping Conversational Agents With Weak Supervision
Many conversational agents in the market today follow a standard bot
development framework which requires training intent classifiers to recognize
user input. The need to create a proper set of training examples is often the
bottleneck in the development process. In many occasions agent developers have
access to historical chat logs that can provide a good quantity as well as
coverage of training examples. However, the cost of labeling them with tens to
hundreds of intents often prohibits taking full advantage of these chat logs.
In this paper, we present a framework called \textit{search, label, and
propagate} (SLP) for bootstrapping intents from existing chat logs using weak
supervision. The framework reduces hours to days of labeling effort down to
minutes of work by using a search engine to find examples, then relies on a
data programming approach to automatically expand the labels. We report on a
user study that shows positive user feedback for this new approach to build
conversational agents, and demonstrates the effectiveness of using data
programming for auto-labeling. While the system is developed for training
conversational agents, the framework has broader application in significantly
reducing labeling effort for training text classifiers.Comment: 6 pages, 3 figures, 1 table, Accepted for publication in IAAI 201
Mapping (Dis-)Information Flow about the MH17 Plane Crash
Digital media enables not only fast sharing of information, but also
disinformation. One prominent case of an event leading to circulation of
disinformation on social media is the MH17 plane crash. Studies analysing the
spread of information about this event on Twitter have focused on small,
manually annotated datasets, or used proxys for data annotation. In this work,
we examine to what extent text classifiers can be used to label data for
subsequent content analysis, in particular we focus on predicting pro-Russian
and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though
we find that a neural classifier improves over a hashtag based baseline,
labeling pro-Russian and pro-Ukrainian content with high precision remains a
challenging problem. We provide an error analysis underlining the difficulty of
the task and identify factors that might help improve classification in future
work. Finally, we show how the classifier can facilitate the annotation task
for human annotators