6,397 research outputs found
A retrieval-based dialogue system utilizing utterance and context embeddings
Finding semantically rich and computer-understandable representations for
textual dialogues, utterances and words is crucial for dialogue systems (or
conversational agents), as their performance mostly depends on understanding
the context of conversations. Recent research aims at finding distributed
vector representations (embeddings) for words, such that semantically similar
words are relatively close within the vector-space. Encoding the "meaning" of
text into vectors is a current trend, and text can range from words, phrases
and documents to actual human-to-human conversations. In recent research
approaches, responses have been generated utilizing a decoder architecture,
given the vector representation of the current conversation. In this paper, the
utilization of embeddings for answer retrieval is explored by using
Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor
(ANN) model, to find similar conversations in a corpus and rank possible
candidates. Experimental results on the well-known Ubuntu Corpus (in English)
and a customer service chat dataset (in Dutch) show that, in combination with a
candidate selection method, retrieval-based approaches outperform generative
ones and reveal promising future research directions towards the usability of
such a system.Comment: A shorter version is accepted at ICMLA2017 conference;
acknowledgement added; typos correcte
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness
In many applications, it is important to characterize the way in which two
concepts are semantically related. Knowledge graphs such as ConceptNet provide
a rich source of information for such characterizations by encoding relations
between concepts as edges in a graph. When two concepts are not directly
connected by an edge, their relationship can still be described in terms of the
paths that connect them. Unfortunately, many of these paths are uninformative
and noisy, which means that the success of applications that use such path
features crucially relies on their ability to select high-quality paths. In
existing applications, this path selection process is based on relatively
simple heuristics. In this paper we instead propose to learn to predict path
quality from crowdsourced human assessments. Since we are interested in a
generic task-independent notion of quality, we simply ask human participants to
rank paths according to their subjective assessment of the paths' naturalness,
without attempting to define naturalness or steering the participants towards
particular indicators of quality. We show that a neural network model trained
on these assessments is able to predict human judgments on unseen paths with
near optimal performance. Most notably, we find that the resulting path
selection method is substantially better than the current heuristic approaches
at identifying meaningful paths.Comment: In Proceedings of the Web Conference (WWW) 201
Hashing based Answer Selection
Answer selection is an important subtask of question answering (QA), where
deep models usually achieve better performance. Most deep models adopt
question-answer interaction mechanisms, such as attention, to get vector
representations for answers. When these interaction based deep models are
deployed for online prediction, the representations of all answers need to be
recalculated for each question. This procedure is time-consuming for deep
models with complex encoders like BERT which usually have better accuracy than
simple encoders. One possible solution is to store the matrix representation
(encoder output) of each answer in memory to avoid recalculation. But this will
bring large memory cost. In this paper, we propose a novel method, called
hashing based answer selection (HAS), to tackle this problem. HAS adopts a
hashing strategy to learn a binary matrix representation for each answer, which
can dramatically reduce the memory cost for storing the matrix representations
of answers. Hence, HAS can adopt complex encoders like BERT in the model, but
the online prediction of HAS is still fast with a low memory cost. Experimental
results on three popular answer selection datasets show that HAS can outperform
existing models to achieve state-of-the-art performance
- …