Search CORE

1,468 research outputs found

Proceedings of the 1st joint workshop on Smart Connected and Wearable Things 2016

Author: Kuflik Tsvi
Limonad Lior
Mühlhäuser Max
Schnelle-Walka Dirk
Publication venue
Publication date: 01/05/2016
Field of study

These are the Proceedings of the 1st joint workshop on Smart Connected and Wearable Things (SCWT'2016, Co-located with IUI 2016). The SCWT workshop integrates the SmartObjects and IoWT workshops. It focusses on the advanced interactions with smart objects in the context of the Internet-of-Things (IoT), and on the increasing popularity of wearables as advanced means to facilitate such interactions

TUbiblio

tuprints

The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes

Author: Anguera
Baby
Bagchi
Barker
Barker
Castro Martinez
DiBiase
Du
Emmanuel Vincent
Fletcher
Frigge
Fujita
Garofalo
Hermansky
Heymann
Hirsch
Hori
Jalalvand
Jon Barker
Kim
Loesch
Ma
Mestre
Mikolov
Moritz
Mostefa
Parihar
Pfeifenberger
Povey
Prudnikov
Renals
Ricard Marxer
Shinji Watanabe
Sivasankaran
Taal
Tachioka
Taghia
Veselý
Vincent
Vincent
Vu
Yoshioka
Zhao
Zhuang
Publication venue: 'Elsevier BV'
Publication date: 15/10/2016
Field of study

This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations

Crossref

INRIA a CCSD electronic archive server

White Rose Research Online

HAL-Rennes 1

Designing Sound for Social Robots: Advancing Professional Practice through Design Principles

Author: Robinson Frederic
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

Sound is one of the core modalities social robots can use to communicate with the humans around them in rich, engaging, and effective ways. While a robot's auditory communication happens predominantly through speech, a growing body of work demonstrates the various ways non-verbal robot sound can affect humans, and researchers have begun to formulate design recommendations that encourage using the medium to its full potential. However, formal strategies for successful robot sound design have so far not emerged, current frameworks and principles are largely untested and no effort has been made to survey creative robot sound design practice. In this dissertation, I combine creative practice, expert interviews, and human-robot interaction studies to advance our understanding of how designers can best ideate, create, and implement robot sound. In a first step, I map out a design space that combines established sound design frameworks with insights from interviews with robot sound design experts. I then systematically traverse this space across three robot sound design explorations, investigating (i) the effect of artificial movement sound on how robots are perceived, (ii) the benefits of applying compositional theory to robot sound design, and (iii) the role and potential of spatially distributed robot sound. Finally, I implement the designs from prior chapters into humanoid robot Diamandini, and deploy it as a case study. Based on a synthesis of the data collection and design practice conducted across the thesis, I argue that the creation of robot sound is best guided by four design perspectives: fiction (sound as a means to convey a narrative), composition (sound as its own separate listening experience), plasticity (sound as something that can vary and adapt over time), and space (spatial distribution of sound as a separate communication channel). The conclusion of the thesis presents these four perspectives and proposes eleven design principles across them which are supported by detailed examples. This work contributes an extensive body of design principles, process models, and techniques providing researchers and designers with new tools to enrich the way robots communicate with humans

UNSWorks

A survey on hardware and software solutions for multimodal wearable assistive devices targeting the visually impaired

Author: Caspo Adam
Jeon Myounghoon
Wersényi György
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2016
Field of study

The market penetration of user-centric assistive devices has rapidly increased in the past decades. Growth in computational power, accessibility, and cognitive device capabilities have been accompanied by significant reductions in weight, size, and price, as a result of which mobile and wearable equipment are becoming part of our everyday life. In this context, a key focus of development has been on rehabilitation engineering and on developing assistive technologies targeting people with various disabilities, including hearing loss, visual impairments and others. Applications range from simple health monitoring such as sport activity trackers, through medical applications including sensory (e.g. hearing) aids and real-time monitoring of life functions, to task-oriented tools such as navigational devices for the blind. This paper provides an overview of recent trends in software and hardware-based signal processing relevant to the development of wearable assistive solutions

Michigan Technological University

Computer-aided investigation of interaction mediated by an AR-enabled wearable interface

Author: Dierker Angelika
Publication venue: Universitätsbibliothek Bielefeld
Publication date: 01/01/2012
Field of study

Dierker A. Computer-aided investigation of interaction mediated by an AR-enabled wearable interface. Bielefeld: Universitätsbibliothek Bielefeld; 2012.This thesis provides an approach on facilitating the analysis of nonverbal behaviour during human-human interaction. Thereby, much of the work that researchers do starting with experiment control, data acquisition, tagging and finally the analysis of the data is alleviated. For this, software and hardware techniques are used as sensor technology, machine learning, object tracking, data processing, visualisation and Augmented Reality. These are combined into an Augmented-Reality-enabled Interception Interface (ARbInI), a modular wearable interface for two users. The interface mediates the users’ interaction thereby intercepting and influencing it. The ARbInI interface consists of two identical setups of sensors and displays, which are mutually coupled. Combining cameras and microphones with sensors, the system offers to record rich multimodal interaction cues in an efficient way. The recorded data can be analysed online and offline for interaction features (e. g. head gestures in head movements, objects in joint attention, speech times) using integrated machine-learning approaches. The classified features can be tagged in the data. For a detailed analysis, the recorded multimodal data is transferred automatically into file bundles loadable in a standard annotation tool where the data can be further tagged by hand. For statistic analyses of the complete multimodal corpus, a toolbox for use in a standard statistics program allows to directly import the corpus and to automate the analysis of multimodal and complex relationships between arbitrary data types. When using the optional multimodal Augmented Reality techniques integrated into ARbInI, the camera records exactly what the participant can see and nothing more or less. The following additional advantages can be used during the experiment: (a) the experiment can be controlled by using the auditory or visual displays thereby ensuring controlled experimental conditions, (b) the experiment can be disturbed, thus offering to investigate how problems in interaction are discovered and solved, and (c) the experiment can be enhanced by interactively comprising the behaviour of the user thereby offering to investigate how users cope with novel interaction channels. This thesis introduces criteria for the design of scenarios in which interaction analysis can benefit from the experimentation interface and presents a set of scenarios. These scenarios are applied in several empirical studies thereby collecting multimodal corpora that particularly include head gestures. The capabilities of computer-aided interaction analysis for the investigation of speech, visual attention and head movements are illustrated on this empirical data. The effects of the head-mounted display (HMD) are evaluated thoroughly in two studies. The results show that the HMD users need more head movements to achieve the same shift of gaze direction and perform less head gestures with slower velocity and fewer repetitions compared to non-HMD users. From this, a reduced willingness to perform head movements if not necessary can be concluded. Moreover, compensation strategies are established like leaning backwards to enlarge the field of view, and increasing the number of utterances or changing the reference to objects to compensate for the absence of mutual eye contact. Two studies investigate the interaction while actively inducing misunderstandings. The participants here use compensation strategies like multiple verification questions and arbitrary gaze movements. Additionally, an enhancement method that highlights the visual attention of the interaction partner is evaluated in a search task. The results show a significantly shorter reaction time and fewer errors

Publications at Bielefeld University

Paralinguistic vocal control of interactive media: how untapped elements of voice might enhance the role of non-speech voice input in the user's experience of multimedia.

Author: Al Hashimi S.
Al Hashimi S.
Publication venue
Publication date: 01/01/2007
Field of study

Much interactive media development, especially commercial development, implies the dominance of the visual modality, with sound as a limited supporting channel. The development of multimedia technologies such as augmented reality and virtual reality has further revealed a distinct partiality to visual media. Sound, however, and particularly voice, have many aspects which have yet to be adequately investigated. Exploration of these aspects may show that sound can, in some respects, be superior to graphics in creating immersive and expressive interactive experiences. With this in mind, this thesis investigates the use of non-speech voice characteristics as a complementary input mechanism in controlling multimedia applications. It presents a number of projects that employ the paralinguistic elements of voice as input to interactive media including both screen-based and physical systems. These projects are used as a means of exploring the factors that seem likely to affect users’ preferences and interaction patterns during non-speech voice control. This exploration forms the basis for an examination of potential roles for paralinguistic voice input. The research includes the conceptual and practical development of the projects and a set of evaluative studies. The work submitted for Ph.D. comprises practical projects (50 percent) and a written dissertation (50 percent). The thesis aims to advance understanding of how voice can be used both on its own and in combination with other input mechanisms in controlling multimedia applications. It offers a step forward in the attempts to integrate the paralinguistic components of voice as a complementary input mode to speech input applications in order to create a synergistic combination that might let the strengths of each mode overcome the weaknesses of the other

Middlesex University Research Repository

Features for voice activity detection: a comparative analysis

Author: Gerhard Schmidt
Markus Buck
Simon Graf
Tobias Herbig
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

Directory of Open Access Books (DOAB)