    Survey on Evaluation Methods for Dialogue Systems

    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    On quality ratings for spoken dialogue systems-experts vs. users

    In the field of Intelligent User Interfaces, Spoken Dialogue Systems (SDSs) play a key role as speech represents a true intuitive means of human communication. Deriving information about its quality can help rendering SDSs more user-Adaptive. Work on automatic estimation of subjective quality usually relies on statistical models. To create those, manual data annotation is required, which may be performed by actual users or by experts. Here, both variants have their advantages and drawbacks. In this paper, we analyze the relationship between user and expert ratings by investigating models which combine the advantages of both types of ratings. We explore two novel approaches using statistical classification methods and evaluate those with a preexisting corpus providing user and expert ratings. After analyzing the results, we eventually recommend to use expert ratings instead of user ratings in general

    Development of an Arabic conversational intelligent tutoring system for education of children with autism spectrum disorder

    Children with Autism Spectrum Disorder (ASD) are affected in different degrees in terms of their level of intellectual ability. Some people with Asperger syndrome or high functioning autism are very intelligent academically but they still have difficulties in social and communication skills. In recent years, many of these pupils are taught within mainstream schools. However, the process of facilitating their learning and participation remains a complex and poorly understood area of education. Although many teachers in mainstream schools are firmly committed to the principles of inclusive education, they do not feel that they have the necessary training and support to provide adequately for pupils with ASD. One solution for this problem is to use a virtual tutor to supplement the education of pupils with ASD in mainstream schools. This thesis describes research to develop a Novel Arabic Conversational Intelligent Tutoring System (CITS), called LANA, for children with ASD, which delivers topics related to the science subject by engaging with the user in Arabic language. The Visual, Auditory, and Kinaesthetic (VAK) learning style model is used in LANA to adapt to the children’s learning style by personalising the tutoring session. Development of an Arabic Conversational Agent has many challenges. Part of the challenge in building such a system is the requirement to deal with the grammatical features and the morphological nature of the Arabic language. The proposed novel architecture for LANA uses both pattern matching (PM) and a new Arabic short text similarity (STS) measure to extract facts from user’s responses to match rules in scripted conversation in a particular domain (Science). In this research, two prototypes of an Arabic CITS were developed (LANA-I) and (LANA-II). LANA-I was developed and evaluated with 24 neurotypical children to evaluate the effectiveness and robustness of the system engine. LANA-II was developed to enhance LANA-I by addressing spelling mistakes and words variation with prefix and suffix. Also in LANA-II, TEACCH method was added to the user interface to adapt the tutorial environment to the autistic students learning, and the knowledge base was expanded by adding a new tutorial. An evaluation methodology and experiment were designed to evaluate the enhanced components of LANA-II architecture. The results illustrated a statistically significant impact on the effectiveness of LANA-II engine when compared to LANA-I. In addition, the results indicated a statistically significant improvement on the autistic students learning gain with adapting to their learning styles indicating that LANA-II can be adapted to autistic children’s learning styles and enhance their learning

    Methodology and algorithms for Urdu language processing in a conversational agent

    This thesis presents the research and development of a novel text based goal-orientated conversational agent (CA) for the Urdu language called UMAIR (Urdu Machine for Artificially Intelligent Recourse). A CA is a computer program that emulates a human in order to facilitate a conversation with the user. The aim is investigate the Urdu language and its lexical and grammatical features in order to, design a novel engine to handle the language unique features of Urdu. The weakness in current Conversational Agent (CA) engines is that they are not suited to be implemented in other languages which have grammar rules and structure totally different to English. From a historical perspective CA’s including the design of scripting engines, scripting methodologies, resources and implementation procedures have been implemented for the most part in English and other Western languages (i.e. German and Spanish). The development of an Urdu conversational agent has required the research and development of new CA framework which incorporates methodologies and components in order overcome the language unique features of Urdu such as free word order, inconsistent use of space, diacritical marks and spelling. The new CA framework was utilised to implement UMAIR. UMAIR is a customer service agent for National Database and Registration Authority (NADRA) designed to answer user queries related to ID card and Passport applications. UMAIR is able to answer user queries related to the domain through discourse with the user by leading the conversation using questions and offering appropriate advice with the intention of leading the discourse to a pre-determined goal. The research and development of UMAIR led to the creation of several novel CA components, namely a new rule based Urdu CA engine which combines pattern matching and sentence/string similarity techniques along with new algorithms to process user utterances. Furthermore, a CA evaluation framework has been researched and tested which addresses the gap in research to develop the evaluation of natural language systems in general. Empirical end user evaluation has validated the new algorithms and components implemented in UMAIR. The results show that UMAIR is effective as an Urdu CA, with the majority of conversations leading to the goal of the conversation. Moreover the results also revealed that the components of the framework work well to mitigate the challenges of free word order and inconsistent word segmentation