7,242 research outputs found

    From Knowledge Augmentation to Multi-tasking: Towards Human-like Dialogue Systems

    Full text link
    The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.Comment: PhD thesi

    Optimizing The Design Of Multimodal User Interfaces

    Get PDF
    Due to a current lack of principle-driven multimodal user interface design guidelines, designers may encounter difficulties when choosing the most appropriate display modality for given users or specific tasks (e.g., verbal versus spatial tasks). The development of multimodal display guidelines from both a user and task domain perspective is thus critical to the achievement of successful human-system interaction. Specifically, there is a need to determine how to design task information presentation (e.g., via which modalities) to capitalize on an individual operator\u27s information processing capabilities and the inherent efficiencies associated with redundant sensory information, thereby alleviating information overload. The present effort addresses this issue by proposing a theoretical framework (Architecture for Multi-Modal Optimization, AMMO) from which multimodal display design guidelines and adaptive automation strategies may be derived. The foundation of the proposed framework is based on extending, at a functional working memory (WM) level, existing information processing theories and models with the latest findings in cognitive psychology, neuroscience, and other allied sciences. The utility of AMMO lies in its ability to provide designers with strategies for directing system design, as well as dynamic adaptation strategies (i.e., multimodal mitigation strategies) in support of real-time operations. In an effort to validate specific components of AMMO, a subset of AMMO-derived multimodal design guidelines was evaluated with a simulated weapons control system multitasking environment. The results of this study demonstrated significant performance improvements in user response time and accuracy when multimodal display cues were used (i.e., auditory and tactile, individually and in combination) to augment the visual display of information, thereby distributing human information processing resources across multiple sensory and WM resources. These results provide initial empirical support for validation of the overall AMMO model and a sub-set of the principle-driven multimodal design guidelines derived from it. The empirically-validated multimodal design guidelines may be applicable to a wide range of information-intensive computer-based multitasking environments

    Multi‐speaker experimental designs: Methodological considerations

    Get PDF
    Research on language use has become increasingly interested in the multimodal and interactional aspects of language – theoretical models of dialogue, such as the Communication Accommodation Theory and the Interactive Alignment Model are examples of this. In addition, researchers have started to give more consideration to the relationship between physiological processes and language use. This article aims to contribute to the advancement in studies of physiological and/or multimodal language use in naturalistic settings. It does so by providing methodological recommendations for such multi-speaker experimental designs. It covers the topics of (a) speaker preparation and logistics, (b) experimental tasks and (c) data synchronisation and post-processing. The types of data that will be considered in further detail include audio and video, electroencephalography, respiratory data and electromagnetic articulography. This overview with recommendations is based on the answers to a questionnaire that was sent amongst the members of the Horizon 2020 research network ‘Conversational Brains’, several researchers in the field and interviews with three additional experts.H2020 Marie SkƂodowska‐Curie Actions http://dx.doi.org/10.13039/100010665Peer Reviewe

    Effectiveness of Staged Partner Training on Conversational Interactions Involving a Person with Severe Aphasia

    Get PDF
    A single subject investigation measured the effects of staged partner communication training on conversational interactions between a familiar conversational partner and a participant with severe aphasia. Conversational variables were analyzed across four conditions: Condition A -- baseline; Condition B -- general aphasia communication strategies; Condition C -- augmented expression strategies; and Condition D -- augmented comprehension strategies. The instructional protocol (slideshow lecture, examples, roleplay, discussion) was implemented immediately before each experimental condition. Two, 5-minute conversations per condition were videotaped, transcribed and coded for the following dependent variables: number of exchanges per topic, percentage of facilitative communication acts, communication role and function, and success of conversational exchanges. Descriptive statistical analysis showed that the partner noticeably increased and maintained his use of natural facilitative strategies immediately following Condition B. Although the partner effectively used complex communication techniques in Condition C, he did not continue to use these strategies in the final condition

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Development of Realistic Stimuli for the Evaluation of Listening Effort using Auditory Evoked Potentials

    Full text link
    Purpose – Listeners often report difficulty perceiving speech in background noise, such as when listening in a restaurant. A common complaint of difficulty perceiving speech in noisy restaurants leads to the development of the present study, where audio recordings of connected discourse mixed with restaurant noise at different signal-to-noise ratios were made to determine the effect of restaurant noise on listening effort. Listening effort has previously been examined with psychophysiological measures, a dual-task paradigm, and qualitative measures using a variety of auditory stimuli ranging from simple tonal stimuli to complex speech stimuli, such as consonant-vowel syllables, words, and full sentences, but never in the context of a conversation. Real-life restaurant noise has also never been used in research study. The central goal is to develop realistic stimuli using real-life conversations that can potentially be used for an electrophysiologic study to determine the effect of background noise on listening effort. Three different conversations with each focusing on a particular topic (food, animals, and locations) were developed. Each conversation contains 25 high- and 25 low-probability target words. The incorporation of high- and low-probability target words in the connected discourse allows the exploration of the effect of predictability in conversations on psychophysiological recordings (P3 and N4). A framework of a potential study utilizing the realistic stimuli with a dual-task paradigm and measurement of auditory evoked potentials (P3 and N4) for the evaluation of the effect of background noise on listening effort is also proposed and pilot data applying this framework to one research subject is presented. The use of real-life conversations in varying restaurant noise for the evaluation of listening effort is a novel approach and has potential to inform clinical practice by providing an ecologically-valid means to assess the difficulties experienced in difficult, but realistic listening situations

    A Pilot Study on the Use of Nonlinguistic Concrete Materials and Drama to Aid Vocabulary Learning for Third-Grade Students

    Get PDF
    This article reports on the effects of the use of nonlinguistic concrete materials and dramatization on student vocabulary learning in eight third-grade classrooms. It follows a preceding study which determined that the use of nonlinguistic concrete materials and drama in K-3 classrooms for vocabulary instruction was minimal and varied across content areas. The results of the pilot study showed that the use of nonlinguistic materials significantly improved vocabulary learning for normally-progressing students (p=0.00185), but had little or no effect on students in reading intervention classrooms. The study was quasi-experimental in nature and utilized six third-grade classrooms of normally-progressing students and two third-grade reading intervention classrooms. Each set of classrooms was randomly divided between treatment and control groups. The study did not prescribe a vocabulary instructional method other than requiring that nonlinguistic concrete materials and drama were to be used in the treatment groups. The concept of augmenting vocabulary lessons with these materials was based on extending the preliterate method of learning names of objects by seeing, touching, hearing, smelling, and tasting them. Vocabulary instruction time was held constant throughout the study for both treatment and control groups

    Eyewear Computing \u2013 Augmenting the Human with Head-Mounted Wearable Assistants

    Get PDF
    The seminar was composed of workshops and tutorials on head-mounted eye tracking, egocentric vision, optics, and head-mounted displays. The seminar welcomed 30 academic and industry researchers from Europe, the US, and Asia with a diverse background, including wearable and ubiquitous computing, computer vision, developmental psychology, optics, and human-computer interaction. In contrast to several previous Dagstuhl seminars, we used an ignite talk format to reduce the time of talks to one half-day and to leave the rest of the week for hands-on sessions, group work, general discussions, and socialising. The key results of this seminar are 1) the identification of key research challenges and summaries of breakout groups on multimodal eyewear computing, egocentric vision, security and privacy issues, skill augmentation and task guidance, eyewear computing for gaming, as well as prototyping of VR applications, 2) a list of datasets and research tools for eyewear computing, 3) three small-scale datasets recorded during the seminar, 4) an article in ACM Interactions entitled \u201cEyewear Computers for Human-Computer Interaction\u201d, as well as 5) two follow-up workshops on \u201cEgocentric Perception, Interaction, and Computing\u201d at the European Conference on Computer Vision (ECCV) as well as \u201cEyewear Computing\u201d at the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)

    Simple Model Also Works: A Novel Emotion Recognition Network in Textual Conversation Based on Curriculum Learning Strategy

    Full text link
    Emotion Recognition in Conversation (ERC) has emerged as a research hotspot in domains such as conversational robots and question-answer systems. How to efficiently and adequately retrieve contextual emotional cues has been one of the key challenges in the ERC task. Existing efforts do not fully model the context and employ complex network structures, resulting in excessive computational resource overhead without substantial performance improvement. In this paper, we propose a novel Emotion Recognition Network based on Curriculum Learning strategy (ERNetCL). The proposed ERNetCL primarily consists of Temporal Encoder (TE), Spatial Encoder (SE), and Curriculum Learning (CL) loss. We utilize TE and SE to combine the strengths of previous methods in a simplistic manner to efficiently capture temporal and spatial contextual information in the conversation. To simulate the way humans learn curriculum from easy to hard, we apply the idea of CL to the ERC task to progressively optimize the network parameters of ERNetCL. At the beginning of training, we assign lower learning weights to difficult samples. As the epoch increases, the learning weights for these samples are gradually raised. Extensive experiments on four datasets exhibit that our proposed method is effective and dramatically beats other baseline models.Comment: 12 pages,9 figure
    • 

    corecore