332,517 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Survey on evaluation methods for dialogue

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Cognitive testing evaluation of survey questions on COVID-19

    Get PDF
    This report documents findings from a cognitive interview evaluation of survey questions on the topic of coronavirus disease 2019 (COVID-19). The Collaborating Center for Question Design and Evaluation Research (CCQDER) at the National Center for Health Statistics (NCHS) conducted this study in support of federal surveys that have incorporated (or intend to incorporate) COVID-19 measurements into their questionnaires. The questions evaluated include items on the RANDS during COVID-19 survey (a methodological survey housed at NCHS) and other federal surveys, such as the NHIS (National Health Interview Survey) and the ECHO (Environmental Influences on Child Health Outcomes) adult primary questionnaire, a project supported by the National Institutes of Health. The findings of this study serve two purposes. First, the results serve as a validity study for COVID-19 questions, so that survey data analysts can understand what constructs the questions capture. As a validity study, the cognitive interviews provide information about the patterns of interpretation associated with these survey questions. Second, this study explored the question-response process which identified problems respondents had in answering the questions and, by extension, possible sources of response error. Information from these findings may be used to improve question design for future surveys.Willson_2020_NCHS_RANDS_COVID.pdf20211123

    Advancing Multi-Modal Deep Learning: Towards Language-Grounded Visual Understanding

    Get PDF
    Using deep learning, computer vision now rivals people at object recognition and detection, opening doors to tackle new challenges in image understanding. Among these challenges, understanding and reasoning about language grounded visual content is of fundamental importance to advancing artificial intelligence. Recently, multiple datasets and algorithms have been created as proxy tasks towards this goal, with visual question answering (VQA) being the most widely studied. In VQA, an algorithm needs to produce an answer to a natural language question about an image. However, our survey of datasets and algorithms for VQA uncovered several sources of dataset bias and sub-optimal evaluation metrics that allowed algorithms to perform well by merely exploiting superficial statistical patterns. In this dissertation, we describe new algorithms and datasets that address these issues. We developed two new datasets and evaluation metrics that enable a more accurate measurement of abilities of a VQA model, and also expand VQA to include new abilities, such as reading text, handling out-of-vocabulary words, and understanding data-visualization. We also created new algorithms for VQA that have helped advance the state-of-the-art for VQA, including an algorithm that surpasses humans on two different chart question answering datasets about bar-charts, line-graphs and pie charts. Finally, we provide a holistic overview of several yet-unsolved challenges in not only VQA but vision and language research at large. Despite enormous progress, we find that a robust understanding and integration of vision and language is still an elusive goal, and much of the progress may be misleading due to dataset bias, superficial correlations and flaws in standard evaluation metrics. We carefully study and categorize these issues for several vision and language tasks and outline several possible paths towards development of safe, robust and trustworthy AI for language-grounded visual understanding

    Data Elicitation for Continuous Awareness of Team Climate Characteristics to Improve Organizations’ Creativity

    Get PDF
    The creativeness of a company’s employees depends on the characteristics of working climates, e.g. au-tonomy or appropriate workload. Tools for their assessment exist, but the frequency of their applica-tion is too low to detect the relevant dynamics which characterize the varying challenges of agile and learning organizations. The evaluation of a first prototype to monitor these dynamics by frequently repeating a common online employee survey re-vealed relevant features to overcome a lack of ac-ceptance of answering the same question items in repetition. \ \ Three variables were identified which influence the acceptance of a repeated question: The time since it had last been answered, the user’s current willing-ness to participate and the user’s situation. Based on these variables, a new prototype offers users more self-determination in their rate of participation, allows for assigning dynamic repetition rates to every question item, and exploits context infor-mation to optimize the prompting of users.

    The Effectiveness of Descriptive Evaluation Plan from the Point of View of Sixth Grade Teachers

    Get PDF
    The present study was conducted with the aim of assessing the effectiveness of descriptive evaluation from the point of view of sixth grade teachers in Karaj city. The method of this study was descriptive-survey. The population consisted of all the sixth grade teachers in district four of Karaj city who had experience of teaching in all elementary grades under descriptive evaluation plan (total 500 individuals). For sampling, the size of the sample was determined with aid of Morgan table and they were selected randomly among the population members (total 215 individuals). One main question and three minor questions existed in this study and for answering them through the viewpoints of teachers a researcher-made questionnaire was used. The result of the answers to the questionnaire items using statistical tests indicated that the level of effectiveness of descriptive evaluation process is significantly effective on the level of teaching-learning, social education and mental health of students

    DETERMINANTEN UND KONSEQUENZEN DER UMFRAGEEINSTELLUNG. Bewertungsdimensionen unterschiedlicher Umfragesponsoren und die Antwortbereitschaft der Befragten

    Get PDF
    This article deals with two questions: a) the evaluations of surveys of different sponsors on the dimensions utility, reliability and burden as determinants of the generalized attitude towards surveys, and b) the answer or refusal of the income question as an indicator of cooperative behavior during the interview as a consequence of respondents� attitudes towards surveys. In the first part of the analysis it is furthermore tested whether the quantity of survey experience in the past moderates the strength of the observed associations. The empirical analysis with data from a local survey based on a random probability sample shows increasingly stronger associations between respondents� sponsor-specific evaluations and their attitudes towards surveys when subjects have taken part more often in surveys in the past. The perceived utility of surveys and the evaluation of scientific sponsors proves to be the strongest determinants for the generalized attitude towards surveys. Regarding the second question of this article it is found that the probability of answering or refusing to answer the income question increases considerably when the interviewees have more negative and - as indicated by their response latencies - at the same time cognitively accessible attitude towards surveys. Thus it is concluded that respondents� attitudes towards surveys have serious consequences for the quality of survey data.

    Gathering and using patron and librarian perceptions of question-answering success

    Get PDF
    This paper discusses the strengths and weaknesses of patrons and reference librarians as sources of data for the evaluation of reference question-answering effectiveness, along with ways to enhance the usefulness of data from each source. It describes the Wisconsin-Ohio Reference Evaluation Program and discusses some illustrative statistics from the project, including data on relationships between patron perceived answering success and factors such as staffing patterns, effort spent on answering questions, types and sources of questions, and collection size.published or submitted for publicatio
    corecore