1 research outputs found
Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures
With the development of Natural Language Processing, Automatic
question-answering system such as Waston, Siri, Alexa, has become one of the
most important NLP applications. Nowadays, enterprises try to build automatic
custom service chatbots to save human resources and provide a 24-hour customer
service. Evaluation of chatbots currently relied greatly on human annotation
which cost a plenty of time. Thus, has initiated a new Short Text Conversation
subtask called Dialogue Quality (DQ) and Nugget Detection (ND) which aim to
automatically evaluate dialogues generated by chatbots. In this paper, we solve
the DQ and ND subtasks by deep neural network. We proposed two models for both
DQ and ND subtasks which is constructed by hierarchical structure: embedding
layer, utterance layer, context layer and memory layer, to hierarchical learn
dialogue representation from word level, sentence level, context level to long
range context level. Furthermore, we apply gating and attention mechanism at
utterance layer and context layer to improve the performance. We also tried
BERT to replace embedding layer and utterance layer as sentence representation.
The result shows that BERT produced a better utterance representation than
multi-stack CNN for both DQ and ND subtasks and outperform other models
proposed by other researches. The evaluation measures are proposed by , that
is, NMD, RSNOD for DQ and JSD, RNSS for ND, which is not traditional evaluation
measures such as accuracy, precision, recall and f1-score. Thus, we have done a
series of experiments by using traditional evaluation measures and analyze the
performance and error.Comment: 8 pages, 5 figure