3,413 research outputs found

    Large scale evaluation of importance maps in automatic speech recognition

    Full text link
    In this paper, we propose a metric that we call the structured saliency benchmark (SSBM) to evaluate importance maps computed for automatic speech recognizers on individual utterances. These maps indicate time-frequency points of the utterance that are most important for correct recognition of a target word. Our evaluation technique is not only suitable for standard classification tasks, but is also appropriate for structured prediction tasks like sequence-to-sequence models. Additionally, we use this approach to perform a large scale comparison of the importance maps created by our previously introduced technique using "bubble noise" to identify important points through correlation with a baseline approach based on smoothed speech energy and forced alignment. Our results show that the bubble analysis approach is better at identifying important speech regions than this baseline on 100 sentences from the AMI corpus.Comment: submitted to INTERSPEECH 202

    Identifying, Evaluating and Applying Importance Maps for Speech

    Full text link
    Like many machine learning systems, speech models often perform well when employed on data in the same domain as their training data. However, when the inference is on out-of-domain data, performance suffers. With a fast-growing number of applications of speech models in healthcare, education, automotive, automation, etc., it is essential to ensure that speech models can generalize to out-of-domain data, especially to noisy environments in real-world scenarios. In contrast, human listeners are quite robust to noisy environments. Thus, a thorough understanding of the differences between human listeners and speech models is urgently required to enhance speech model performance in noise. These differences exist presumably because the speech model does not use the same information as humans for recognizing the speech. A possible solution is encouraging the speech model to attend to the same time-frequency regions as human listeners. In this way, speech model generalization in noise may be improved. We define those time-frequency regions that humans or machines focus on to recognize the speech as importance maps (IMs). In this research, first, we investigate how to identify speech importance maps. Second, we compare human and machine importance maps to understand how they differ and how the speech model can learn from humans to improve its performance in noise. Third, we develop a structured saliency benchmark (SSBM), a metric for evaluating IMs. Finally, we propose a new application of IMs as data augmentation for speech models, enhancing their performance and enabling them to better generalize to out-of-domain noise. Overall, our work demonstrates that we can improve speech models and achieve out-of-domain generalization to different noise environments with importance maps. In the future, we will expand our work with large-scale speech models and deploy different methods to identify IMs and use them to augment the speech data, such as those based on human responses. We can also extend the technique to computer vision tasks, such as image recognition by predicting importance maps for images and use IMs to enhance model performance to out-of-domain data

    Understanding Collaborative Sensemaking for System Design — An Investigation of Musicians\u27 Practice

    Get PDF
    There is surprisingly little written in information science and technology literature about the design of tools used to support the collaboration of creators. Understanding collaborative sensemaking through the use of language has been traditionally applied to non-work domains, but this method is also well-suited for informing hypotheses about the design collaborative systems. The presence of ubiquitous, mobile technology, and development of multi-user virtual spaces invites investigation of design which is based on naturalistic, real world, creative group behaviors, including the collaborative work of musicians. This thesis is considering the co-construction of new (musical) knowledge by small groups. Co-construction of new knowledge is critical to the definition of an information system because it emphasizes coordination and resource sharing among group members (versus individual members independently doing their own tasks and only coming together to collate their contributions as a final product). This work situates the locus of creativity on the process itself, rather than on the output (the musical result) or the individuals (members of the band). This thesis describes a way to apply quantitative observations to inform qualitative assessment of the characteristics of collaborative sensemaking in groups. Conversational data were obtained from nine face-to-face collaborative composing sessions, involving three separate bands producing 18 hours of recorded interactions. Topical characteristics of the discussion, namely objects, plans, properties and performance; as well as emergent patterns of generative, evaluative, revision, and management conversational acts within the group were seen as indicative of knowledge construction. The findings report the use of collaborative pathways: iterative cycles of generation, evaluation and revision of temporary solutions used to move the collaboration forward. In addition, bracketing of temporary solutions served to help collaborators reuse content and offload attentional resources. Ambiguity in language, evaluation criteria, goal formation, and group awareness meant that existing knowledge representations were insufficient in making sense of incoming data and necessitated reformulating those representations. Further, strategic use of affective language was found to be instrumental in bridging knowledge gaps. Based on these findings, features of a collaborative system are proposed to help in facilitating sensemaking routines at various stages of a creative task. This research contributes to the theoretical understanding of collaborative sensemaking during non-work, creative activities in order to inform the design of systems for supporting these activities. By studying an environment which forms a potential microcosm of virtual interaction between groups, it provides a framework for understanding and automating collaborative discussion content in terms of the features of dialogue

    An analysis of social interaction between novice older adults when learning gesture-based skills through simple digital games

    Get PDF
    This paper reports three exploratory empirical studies with older adults that had little or no prior experience with interactive technologies. The participants were introduced to interactive technology by playing games on touchscreens, playing in pairs with the assistance of a mentor. We focus on two principle aspects, the peer-to-peer interaction during these sessions, and the role of the mentor in progressing the sessions. In the case of peer-to-peer interaction we looked for ways in which players supported each other during interaction to assess the role of peer interaction in this context. In the case of mentoring, we examined the efficacy of a minimalist approach where verbal encouragement, suggestions or (in the last resort) intervention are used to provide support to learners. The sessions showed that learners typically could play and learn basic manipulations independently after initial help and guidance from mentors. We also found that peer interaction, both in verbal and non-verbal communication and cooperative action was broadly a positive influence within sessions, suggesting that there is significant value in building confidence as well as in learning
    • …
    corecore