1,112 research outputs found

    Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition

    Get PDF
    Abstract fMPE is a recently introduced discriminative training technique that uses the Minimum Phone Error (MPE) discriminative criterion to train a feature-level transformation. In this paper we investigate fMPE trained audio/visual features for multistream HMM-based audio-visual speech recognition. A flexible, layer-based implementation of fMPE allows us to combine the the visual information with the audio stream using the discriminative traning process, and dispense with the multiple stream approach. Experiments are reported on the IBM infrared headset audio-visual database. On average of 20-speaker 1 hour speaker independent test data, the fMPE trained acoustic features achieve 33% relative gain. Adding video layers on top of audio layers gives additional 10% gain over fMPE trained features from the audio stream alone. The fMPE trained visual features achieve 14% relative gain, while the decision fusion of audio/visual streams with fMPE trained features achieves 29% relative gain. However, fMPE trained models do not improve over the original models on the mismatched noisy test data

    Integration of a voice recognition system in a social robot

    Get PDF
    Human-Robot Interaction (HRI) 1 is one of the main fields in the study and research of robotics. Within this field, dialog systems and interaction by voice play a very important role. When speaking about human- robot natural dialog we assume that the robot has the capability to accurately recognize the utterance what the human wants to transmit verbally and even its semantic meaning, but this is not always achieved. In this paper we describe the steps and requirements that we went through in order to endow the personal social robot Maggie, developed in the University Carlos III of Madrid, with the capability of understanding the natural language spoken by any human. We have analyzed the different possibilities offered by current software/hardware alternatives by testing them in real environments. We have obtained accurate data related to the speech recognition capabilities in different environments, using the most modern audio acquisition systems and analyzing not so typical parameters as user age, sex, intonation, volume and language. Finally we propose a new model to classify recognition results as accepted and rejected, based in a second ASR opinion. This new approach takes into account the pre-calculated success rate in noise intervals for each recognition framework decreasing false positives and false negatives rate.The funds have provided by the Spanish Government through the project called `Peer to Peer Robot-Human Interaction'' (R2H), of MEC (Ministry of Science and Education), and the project “A new approach to social robotics'' (AROS), of MICINN (Ministry of Science and Innovation). The research leading to these results has received funding from the RoboCity2030-II-CM project (S2009/DPI-1559), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU

    A HoloLens Application to Aid People who are Visually Impaired in Navigation Tasks

    Get PDF
    Day-to-day activities such as navigation and reading can be particularly challenging for people with visual impairments. Reading text on signs may be especially difficult for people who are visually impaired because signs have variable color, contrast, and size. Indoors, signage may include office, classroom, restroom, and fire evacuation signs. Outdoors, they may include street signs, bus numbers, and store signs. Depending on the level of visual impairment, just identifying where signs exist can be a challenge. Using Microsoft\u27s HoloLens, an augmented reality device, I designed and implemented the TextSpotting application that helps those with low vision identify and read indoor signs so that they can navigate text-heavy environments. The application can provide both visual information and auditory information. In addition to developing the application, I conducted a user study to test its effectiveness. Participants were asked to find a room in an unfamiliar hallway. Those that used the TextSpotting application completed the task less quickly yet reported higher levels of ease, comfort, and confidence, indicating the application\u27s limitations and potential in providing an effective means to navigate unknown environments via signage

    Ability of head-mounted display technology to improve mobility in people with low vision: a systematic review

    Get PDF
    Purpose: The purpose of this study was to undertake a systematic literature review on how vision enhancements, implemented using head-mounted displays (HMDs), can improve mobility, orientation, and associated aspects of visual function in people with low vision. Methods: The databases Medline, Chinl, Scopus, and Web of Science were searched for potentially relevant studies. Publications from all years until November 2018 were identified based on predefined inclusion and exclusion criteria. The data were tabulated and synthesized to produce a systematic review. Results: The search identified 28 relevant papers describing the performance of vision enhancement techniques on mobility and associated visual tasks. Simplifying visual scenes improved obstacle detection and object recognition but decreased walking speed. Minification techniques increased the size of the visual field by 3 to 5 times and improved visual search performance. However, the impact of minification on mobility has not been studied extensively. Clinical trials with commercially available devices recorded poor results relative to conventional aids. Conclusions: The effects of current vision enhancements using HMDs are mixed. They appear to reduce mobility efficiency but improved obstacle detection and object recognition. The review highlights the lack of controlled studies with robust study designs. To support the evidence base, well-designed trials with larger sample sizes that represent different types of impairments and real-life scenarios are required. Future work should focus on identifying the needs of people with different types of vision impairment and providing targeted enhancements. Translational Relevance: This literature review examines the evidence regarding the ability of HMD technology to improve mobility in people with sight loss

    Taux : a system for evaluating sound feedback in navigational tasks

    Get PDF
    This thesis presents the design and development of an evaluation system for generating audio displays that provide feedback to persons performing navigation tasks. It first develops the need for such a system by describing existing wayfinding solutions, investigating new electronic location-based methods that have the potential of changing these solutions and examining research conducted on relevant audio information representation techniques. An evaluation system that supports the manipulation of two basic classes of audio display is then described. Based on prior work on wayfinding with audio display, research questions are developed that investigate the viability of different audio displays. These are used to generate hypotheses and develop an experiment which evaluates four variations of audio display for wayfinding. Questions are also formulated that evaluate a baseline condition that utilizes visual feedback. An experiment which tests these hypotheses on sighted users is then described. Results from the experiment suggest that spatial audio combined with spoken hints is the best approach of the approaches comparing spatial audio. The test experiment results also suggest that muting a varying audio signal when a subject is on course did not improve performance. The system and method are then refined. A second experiment is conducted with improved displays and an improved experiment methodology. After adding blindfolds for sighted subjects and increasing the difficulty of navigation tasks by reducing the arrival radius, similar comparisons were observed. Overall, the two experiments demonstrate the viability of the prototyping tool for testing and refining multiple different audio display combinations for navigational tasks. The detailed contributions of this work and future research opportunities conclude this thesis

    Towards a multimodal interaction space: Categorisation and applications

    Get PDF
    Based on many experiences of developing interactive systems by the authors, a framework for the description and analysis of interaction has been developed. The dimensions of this multimodal interaction space have been identified as sensory modalities, modes and levels of interaction. To illustrate and validate this framework, development of multimodal interaction styles is carried out and interactions in the real world are studied, going from theory to practice and back again. The paper describes the framework and two recent projects, one in the field of interactive architecture and another in the field of multimodal HCI research. Both projects use multiple modalities for interaction, particularly movement based interaction styles. © Springer-Verlag London Limited 2007

    Brain-Computer Interfaces for Non-clinical (Home, Sports, Art, Entertainment, Education, Well-being) Applications

    Get PDF
    HCI researchers interest in BCI is increasing because the technology industry is expanding into application areas where efficiency is not the main goal of concern. Domestic or public space use of information and communication technology raise awareness of the importance of affect, comfort, family, community, or playfulness, rather than efficiency. Therefore, in addition to non-clinical BCI applications that require efficiency and precision, this Research Topic also addresses the use of BCI for various types of domestic, entertainment, educational, sports, and well-being applications. These applications can relate to an individual user as well as to multiple cooperating or competing users. We also see a renewed interest of artists to make use of such devices to design interactive art installations that know about the brain activity of an individual user or the collective brain activity of a group of users, for example, an audience. Hence, this Research Topic also addresses how BCI technology influences artistic creation and practice, and the use of BCI technology to manipulate and control sound, video, and virtual and augmented reality (VR/AR)
    • …
    corecore