4,517 research outputs found

    On Distant Speech Recognition for Home Automation

    No full text
    The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms

    Distant speech recognition for home automation: Preliminary experimental results in a smart home

    Full text link
    International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER

    Evaluation of a context-aware voice interface for Ambient Assisted Living: qualitative user study vs. quantitative system evaluation

    No full text
    International audienceThis paper presents an experiment with seniors and people with visual impairment in a voice-controlled smart home using the SWEET-HOME system. The experiment shows some weaknesses in automatic speech recognition which must be addressed, as well as the need of better adaptation to the user and the environment. Indeed, users were disturbed by the rigid structure of the grammar and were eager to adapt it to their own preferences. Surprisingly, while no humanoid aspect was introduced in the system, the senior participants were inclined to embody the system. Despite these aspects to improve, the system has been favourably assessed as diminishing most participant fears related to the loss of autonomy

    Development of Automatic Speech Recognition Techniques for Elderly Home Support: Applications and Challenges

    Get PDF
    International audienceVocal command may have considerable advantages in terms of usability in the AAL domain. However, efficient audio analysis in smart home environment is a challenging task in large part because of bad speech recognition results in the case of elderly people. Dedicated speech corpora were recorded and employed to adapted generic speech recog-nizers to this type of population. Evaluation results of a first experiment allowed to draw conclusions about the distress call detection. A second experiments involved participants who played fall scenarios in a realistic smart home, 67% of the distress calls were detected online. These results show the difficulty of the task and serve as basis to discuss the stakes and the challenges of this promising technology for AAL

    A survey on Automatic Speech Recognition systems for Portuguese language and its variations

    Get PDF
    Communication has been an essential part of being human and living in society. There are several different languages and variations of them, so you can speak English in one place and not be able to communicate effectively with someone who speaks English with a different accent. There are several application areas where voice/speech data can be of importance, such as health, security, biometric analysis or education. However, most studies focus on English, Arabic or Asian languages, neglecting other relevant languages, such as Portuguese, which leaves their investigations wide open. Thus, it is crucial to understand the area, where the main focus is: what are the most used techniques for feature extraction and classification, and so on. This paper presents a survey on automatic speech recognition components for Portuguese-based language and its variations, as an understudied language. With a total of 101 papers from 2012 to 2018, the Portuguese-based automatic speech recognition field tendency will be explained, and several possible unexplored methods will be presented and discussed in a collaborative and overall way as our main contribution

    Smart speakers and the news in Portuguese: consumption pattern and challenges for content producers

    Get PDF
    The voice assistants popularized by smartphones are now the driving force behind a device that is making its way into homes in recent years: smart speakers. Since 2018, these devices are available in Brazilian Portuguese. These devices are also a new platform for news distribution and consumption. How does the platform define the content that will be delivered to the user? What challenges do content producers face? How does the user access this news? To try to find the answers to these questions, we conducted a literature review, a market situation point through business reports, developed an online survey with smart speaker users and also interviewed content producers. The answers show that there is influence of algorithms and the business model. An extra challenge for Portuguese content producers is the language itself. The voice assistant systems still have difficulty understanding words and expressions in Portuguese for users. This work may be helpful for content producers, especially Portuguese-speaking ones, to find ways to reach their audience.Os assistentes de voz popularizados pelos smartphones sĂŁo agora o motor de um aparelho que estĂĄ entrando nas casas nos Ășltimos anos: os smart speakers. Desde 2018, Esses equipamentos estĂŁo disponĂ­veis em portuguĂȘs do Brasil. Tais aparelhos sĂŁo tambĂ©m uma nova plataforma para distribuição e consumo de notĂ­cias. Como a plataforma define o conteĂșdo que serĂĄ entregue ao usuĂĄrio? Quais os desafios que os produtores de conteĂșdo enfrentam? Como o usuĂĄrio acessa essas notĂ­cias? Para tentar encontrar as respostas a essas questĂ”es, fizemos uma revisĂŁo de literatura, um ponto de situação do mercado atravĂ©s de relatĂłrios empresariais, desenvolvemos um inquĂ©rito online com usuĂĄrios de smart speakers e tambĂ©m entrevistamos produtores de conteĂșdo. As respostas mostram que hĂĄ influĂȘncia dos algoritmos e do modelo de negĂłcio. Um desafio extra para os produtores de conteĂșdo em portuguĂȘs Ă© a prĂłpria lĂ­ngua. Os sistemas dos assistentes de voz ainda apresentam dificuldade de compreensĂŁo de palavras e expressĂ”es em portuguĂȘs para os usuĂĄrios. Este trabalho poderĂĄ ser Ăștil para produtores de conteĂșdo, especialmente de lĂ­ngua portuguesa, encontrarem maneiras de chegar ao pĂșblico

    Collective efficiency strategies: a policy instrument for the competitiveness of low-density territories

    Get PDF
    This paper motivates the focus of EU cohesion policy at large and the territorial cooperation tools on the economic development of territories featuring impoverishing growth associated to low population density. An innovative policy approach to help solving this problem in many Member States is put forward here. It is based on the economic concept of “collective efficiency”. It should be understood as a proposal to improve EU cohesion policy in the next programming period. As such, the paper suggests actual ideas to be included in the forthcoming Common Strategic Framework and Development and Investment Partnership Contracts.

    Method for Reading Sensors and Controlling Actuators Using Audio Interfaces of Mobile Devices

    Get PDF
    This article presents a novel closed loop control architecture based on audio channels of several types of computing devices, such as mobile phones and tablet computers, but not restricted to them. The communication is based on an audio interface that relies on the exchange of audio tones, allowing sensors to be read and actuators to be controlled. As an application example, the presented technique is used to build a low cost mobile robot, but the system can also be used in a variety of mechatronics applications and sensor networks, where smartphones are the basic building blocks

    Concurrent speech feedback for blind people on touchscreens

    Get PDF
    Tese de Mestrado, Engenharia InformĂĄtica, 2023, Universidade de Lisboa, Faculdade de CiĂȘnciasSmartphone interactions are demanding. Most smartphones come with limited physical buttons, so users can not rely on touch to guide them. Smartphones come with built-in accessibility mechanisms, for example, screen readers, that make the interaction accessible for blind users. However, some tasks are still inefficient or cumbersome. Namely, when scanning through a document, users are limited by the single sequential audio channel provided by screen readers. Or when tasks are interrupted in the presence of other actions. In this work, we explored alternatives to optimize smartphone interaction by blind people by leveraging simultaneous audio feedback with different configurations, such as different voices and spatialization. We researched 5 scenarios: Task interruption, where we use concurrent speech to reproduce a notification without interrupting the current task; Faster information consumption, where we leverage concurrent speech to announce up to 4 different contents simultaneously; Text properties, where the textual formatting is announced; The map scenario, where spatialization provides feedback on how close or distant a user is from a particular location; And smartphone interactions scenario, where there is a corresponding sound for each gesture, and instead of reading the screen elements (e.g., button), a corresponding sound is played. We conducted a study with 10 blind participants whose smartphone usage experience ranges from novice to expert. During the study, we asked participants’ perceptions and preferences for each scenario, what could be improved, and in what situations these extra capabilities are valuable to them. Our results suggest that these extra capabilities we presented are helpful for users, especially if these can be turned on and off according to the user’s needs and situation. Moreover, we find that using concurrent speech works best when announcing short messages to the user while listening to longer content and not so much to have lengthy content announced simultaneously

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201
    • 

    corecore