32,379 research outputs found

    Design and implementation of a user-oriented speech recognition interface: the synergy of technology and human factors

    Get PDF
    The design and implementation of a user-oriented speech recognition interface are described. The interface enables the use of speech recognition in so-called interactive voice response systems which can be accessed via a telephone connection. In the design of the interface a synergy of technology and human factors is achieved. This synergy is very important for making speech interfaces a natural and acceptable form of human-machine interaction. Important concepts such as interfaces, human factors and speech recognition are discussed. Additionally, an indication is given as to how the synergy of human factors and technology can be realised by a sketch of the interface's implementation. An explanation is also provided of how the interface might be integrated in different applications fruitfully

    She inches glass to break: conversations between friends

    Get PDF
    She inches glass to break: conversations between friends is a project that aims to manifest, through research and practice, my own feminist language within the videos I have produced in my final year of my Masters of Fine Arts. My feminist language is Australian and intersectional, invested in combating sexism, racism and in deepening language and representation around sexuality in relation to Asian women. This project discusses my video She inches glass to break (2018) in length, which created intersectional feminist dialogue in response to feminist filmmaker Ulrike Ottinger’s film Ticket of No Return (1979) and Breakfast at Tiffany’s (1961). Additionally, given this project’s investment in language, this body of work is influenced both by aspects of psychoanalysis – in which speech is central to a “therapeutic action” – and by feminist linguistics in which linguistic analysis reveals some of the mechanisms through which language constrains, coerces and represents women, men and non-binary people in oppressive ways

    Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

    Full text link
    While recent neural encoder-decoder models have shown great promise in modeling open-domain conversations, they often generate dull and generic responses. Unlike past work that has focused on diversifying the output of the decoder at word-level to alleviate this problem, we present a novel framework based on conditional variational autoencoders that captures the discourse-level diversity in the encoder. Our model uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders. We have further developed a novel variant that is integrated with linguistic prior knowledge for better performance. Finally, the training procedure is improved by introducing a bag-of-word loss. Our proposed models have been validated to generate significantly more diverse responses than baseline approaches and exhibit competence in discourse-level decision-making.Comment: Appeared in ACL2017 proceedings as a long paper. Correct a calculation mistake in Table 1 E-bow & A-bow and results into higher score

    Explorations in engagement for humans and robots

    Get PDF
    This paper explores the concept of engagement, the process by which individuals in an interaction start, maintain and end their perceived connection to one another. The paper reports on one aspect of engagement among human interactors--the effect of tracking faces during an interaction. It also describes the architecture of a robot that can participate in conversational, collaborative interactions with engagement gestures. Finally, the paper reports on findings of experiments with human participants who interacted with a robot when it either performed or did not perform engagement gestures. Results of the human-robot studies indicate that people become engaged with robots: they direct their attention to the robot more often in interactions where engagement gestures are present, and they find interactions more appropriate when engagement gestures are present than when they are not.Comment: 31 pages, 5 figures, 3 table

    How Helpdesk Agents Help Clients

    Get PDF
    Helpdesks are an important channel for supporting users of technical products and software. This study analyses some phenomena in telephone helpdesk calls, using conversational analysis as a methodological and theoretical framework. Helpdesk calls are characterized by the common goal of the helpdesk agent and the client to understand and solve the client's problem with a particular technical device or with computer software. Both parties cooperate in a complex manner to define and diagnose the problem, and to solve it. The paper identifies the typical structure of a helpdesk call and describes a number of strategies that the participants use to make the call successful

    Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

    Full text link
    In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition

    A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs

    Get PDF
    Backchannels (BCs) are short vocal and visual listener responses that signal attention, interest, and understanding to the speaker. Previous studies have investigated BC prediction in telephone-style dialogs from prosodic cues. In contrast, we consider spontaneous face-to-face dialogs. The additional visual modality allows speaker and listener to monitor each other's attention continuously, and we hypothesize that this affects the BC-inviting cues. In this study, we investigate how gaze, in addition to prosody, can cue BCs. Moreover, we focus on the type of BC performed, with the aim to find out whether vocal and visual BCs are invited by similar cues. In contrast to telephone-style dialogs, we do not find rising/falling pitch to be a BC-inviting cue. However, in a face-to-face setting, gaze appears to cue BCs. In addition, we find that mutual gaze occurs significantly more often during visual BCs. Moreover, vocal BCs are more likely to be timed during pauses in the speaker's speech
    • …
    corecore