4,517 research outputs found
On Distant Speech Recognition for Home Automation
The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms
Distant speech recognition for home automation: Preliminary experimental results in a smart home
International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER
Evaluation of a context-aware voice interface for Ambient Assisted Living: qualitative user study vs. quantitative system evaluation
International audienceThis paper presents an experiment with seniors and people with visual impairment in a voice-controlled smart home using the SWEET-HOME system. The experiment shows some weaknesses in automatic speech recognition which must be addressed, as well as the need of better adaptation to the user and the environment. Indeed, users were disturbed by the rigid structure of the grammar and were eager to adapt it to their own preferences. Surprisingly, while no humanoid aspect was introduced in the system, the senior participants were inclined to embody the system. Despite these aspects to improve, the system has been favourably assessed as diminishing most participant fears related to the loss of autonomy
Development of Automatic Speech Recognition Techniques for Elderly Home Support: Applications and Challenges
International audienceVocal command may have considerable advantages in terms of usability in the AAL domain. However, efficient audio analysis in smart home environment is a challenging task in large part because of bad speech recognition results in the case of elderly people. Dedicated speech corpora were recorded and employed to adapted generic speech recog-nizers to this type of population. Evaluation results of a first experiment allowed to draw conclusions about the distress call detection. A second experiments involved participants who played fall scenarios in a realistic smart home, 67% of the distress calls were detected online. These results show the difficulty of the task and serve as basis to discuss the stakes and the challenges of this promising technology for AAL
A survey on Automatic Speech Recognition systems for Portuguese language and its variations
Communication has been an essential part of being human and living in society. There are several different languages and variations of them, so you can speak English in one place and not be able to communicate effectively with someone who speaks English with a different accent. There are several application areas where voice/speech data can be of importance, such as health, security, biometric analysis or education. However, most studies focus on English, Arabic or Asian languages, neglecting other relevant languages, such as Portuguese, which leaves their investigations wide open. Thus, it is crucial to understand the area, where the main focus is: what are the most used techniques for feature extraction and classification, and so on. This paper presents a survey on automatic speech recognition components for Portuguese-based language and its variations, as an understudied language. With a total of 101 papers from 2012 to 2018, the Portuguese-based automatic speech recognition field tendency will be explained, and several possible unexplored methods will be presented and discussed in a collaborative and overall way as our main contribution
Smart speakers and the news in Portuguese: consumption pattern and challenges for content producers
The voice assistants popularized by smartphones are now the driving force behind a device that is making its way into homes in recent years: smart speakers. Since 2018, these devices are available in Brazilian Portuguese. These devices are also a new platform for news distribution and consumption. How does the platform define the content that will be delivered to the user? What challenges do content producers face? How does the user access this news? To try to find the answers to these questions, we conducted a literature review, a market situation point through business reports, developed an online survey with smart speaker users and also interviewed content producers. The answers show that there is influence of algorithms and the business model. An extra challenge for Portuguese content producers is the language itself. The voice assistant systems still have difficulty understanding words and expressions in Portuguese for users. This work may be helpful for content producers, especially Portuguese-speaking ones, to find ways to reach their audience.Os assistentes de voz popularizados pelos smartphones sĂŁo agora o motor de um aparelho que estĂĄ entrando nas casas nos Ășltimos anos: os smart speakers. Desde 2018, Esses equipamentos estĂŁo disponĂveis em portuguĂȘs do Brasil. Tais aparelhos sĂŁo tambĂ©m uma nova plataforma para distribuição e consumo de notĂcias. Como a plataforma define o conteĂșdo que serĂĄ entregue ao usuĂĄrio? Quais os desafios que os produtores de conteĂșdo enfrentam? Como o usuĂĄrio acessa essas notĂcias? Para tentar encontrar as respostas a essas questĂ”es, fizemos uma revisĂŁo de literatura, um ponto de situação do mercado atravĂ©s de relatĂłrios empresariais, desenvolvemos um inquĂ©rito online com usuĂĄrios de smart speakers e tambĂ©m entrevistamos produtores de conteĂșdo. As respostas mostram que hĂĄ influĂȘncia dos algoritmos e do modelo de negĂłcio. Um desafio extra para os produtores de conteĂșdo em portuguĂȘs Ă© a prĂłpria lĂngua. Os sistemas dos assistentes de voz ainda apresentam dificuldade de compreensĂŁo de palavras e expressĂ”es em portuguĂȘs para os usuĂĄrios. Este trabalho poderĂĄ ser Ăștil para produtores de conteĂșdo, especialmente de lĂngua portuguesa, encontrarem maneiras de chegar ao pĂșblico
Collective efficiency strategies: a policy instrument for the competitiveness of low-density territories
This paper motivates the focus of EU cohesion policy at large and the territorial cooperation tools on the economic development of territories featuring impoverishing growth associated to low population density. An innovative policy approach to help solving this problem in many Member States is put forward here. It is based on the economic concept of âcollective efficiencyâ. It should be understood as a proposal to improve EU cohesion policy in the next programming period. As such, the paper suggests actual ideas to be included in the forthcoming Common Strategic Framework and Development and Investment Partnership Contracts.
Method for Reading Sensors and Controlling Actuators Using Audio Interfaces of Mobile Devices
This article presents a novel closed loop control architecture based on audio channels of several types of computing devices, such as mobile phones and tablet computers, but not restricted to them. The communication is based on an audio interface that relies on the exchange of audio tones, allowing sensors to be read and actuators to be controlled. As an application example, the presented technique is used to build a low cost mobile robot, but the system can also be used in a variety of mechatronics applications and sensor networks, where smartphones are the basic building blocks
Concurrent speech feedback for blind people on touchscreens
Tese de Mestrado, Engenharia InformĂĄtica, 2023, Universidade de Lisboa, Faculdade de CiĂȘnciasSmartphone interactions are demanding. Most smartphones come with limited physical buttons, so users can not rely on touch to guide them. Smartphones come with built-in accessibility
mechanisms, for example, screen readers, that make the interaction accessible for blind users.
However, some tasks are still inefficient or cumbersome. Namely, when scanning through a document, users are limited by the single sequential audio channel provided by screen readers. Or
when tasks are interrupted in the presence of other actions.
In this work, we explored alternatives to optimize smartphone interaction by blind people by
leveraging simultaneous audio feedback with different configurations, such as different voices and
spatialization. We researched 5 scenarios: Task interruption, where we use concurrent speech to
reproduce a notification without interrupting the current task; Faster information consumption,
where we leverage concurrent speech to announce up to 4 different contents simultaneously; Text
properties, where the textual formatting is announced; The map scenario, where spatialization
provides feedback on how close or distant a user is from a particular location; And smartphone
interactions scenario, where there is a corresponding sound for each gesture, and instead of reading
the screen elements (e.g., button), a corresponding sound is played. We conducted a study with
10 blind participants whose smartphone usage experience ranges from novice to expert. During
the study, we asked participantsâ perceptions and preferences for each scenario, what could be
improved, and in what situations these extra capabilities are valuable to them.
Our results suggest that these extra capabilities we presented are helpful for users, especially if
these can be turned on and off according to the userâs needs and situation. Moreover, we find that
using concurrent speech works best when announcing short messages to the user while listening
to longer content and not so much to have lengthy content announced simultaneously
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201
- âŠ