Search CORE

4,517 research outputs found

On Distant Speech Recognition for Home Automation

Author: A Baba
B Lecouteux
B Vlasenko
D Istrate
F Mäyrä
F Portet
G Filho
J Barker
J Fozard
JM Valin
K McCoy
K McCoy
K Reidel
L Baeckman
L Lines
M Chan
M Hamill
M Vacher
M Vacher
M Vacher
M Wölfel
MK Wolters
N Takeda
P Chahuara
P Mueller
P Nocera
R López-Cózar
RC Vipperla
S Bouakaz
S Katz
T Koskela
T Pellegrini
W Edwards
W Ryan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2015
Field of study

The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms

Hal - Université Grenoble Alpes

Distant speech recognition for home automation: Preliminary experimental results in a smart home

Author: Lecouteux Benjamin
Portet François
Vacher Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/05/2011
Field of study

International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER

Hal - Université Grenoble Alpes

Evaluation of a context-aware voice interface for Ambient Assisted Living: qualitative user study vs. quantitative system evaluation

Author: Caffiau Sybille
Chahuara Pedro
Elias Elena
Lecouteux Benjamin
Meillon Brigitte
Portet François
Roux Camille
Vacher Michel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/05/2015
Field of study

International audienceThis paper presents an experiment with seniors and people with visual impairment in a voice-controlled smart home using the SWEET-HOME system. The experiment shows some weaknesses in automatic speech recognition which must be addressed, as well as the need of better adaptation to the user and the environment. Indeed, users were disturbed by the rigid structure of the grammar and were eager to adapt it to their own preferences. Surprisingly, while no humanoid aspect was introduced in the system, the senior participants were inclined to embody the system. Despite these aspects to improve, the system has been favourably assessed as diminishing most participant fears related to the loss of autonomy

Hal - Université Grenoble Alpes

Development of Automatic Speech Recognition Techniques for Elderly Home Support: Applications and Challenges

Author: Aman Frédéric
Portet François
Rossato Solange
Vacher Michel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/08/2015
Field of study

International audienceVocal command may have considerable advantages in terms of usability in the AAL domain. However, efficient audio analysis in smart home environment is a challenging task in large part because of bad speech recognition results in the case of elderly people. Dedicated speech corpora were recorded and employed to adapted generic speech recog-nizers to this type of population. Evaluation results of a first experiment allowed to draw conclusions about the distress call detection. A second experiments involved participants who played fall scenarios in a realistic smart home, 67% of the distress calls were detected online. These results show the difficulty of the task and serve as basis to discuss the stakes and the challenges of this promising technology for AAL

Hal - Université Grenoble Alpes

A survey on Automatic Speech Recognition systems for Portuguese language and its variations

Author: Aguiar de Lima Thales
da Costa-Abreu Marjory
Publication venue: 'Elsevier BV'
Publication date: 06/12/2019
Field of study

Communication has been an essential part of being human and living in society. There are several different languages and variations of them, so you can speak English in one place and not be able to communicate effectively with someone who speaks English with a different accent. There are several application areas where voice/speech data can be of importance, such as health, security, biometric analysis or education. However, most studies focus on English, Arabic or Asian languages, neglecting other relevant languages, such as Portuguese, which leaves their investigations wide open. Thus, it is crucial to understand the area, where the main focus is: what are the most used techniques for feature extraction and classification, and so on. This paper presents a survey on automatic speech recognition components for Portuguese-based language and its variations, as an understudied language. With a total of 101 papers from 2012 to 2018, the Portuguese-based automatic speech recognition field tendency will be explained, and several possible unexplored methods will be presented and discussed in a collaborative and overall way as our main contribution

Smart speakers and the news in Portuguese: consumption pattern and challenges for content producers

Author: Moreira Wanessa de Andrade
Publication venue
Publication date: 13/07/2021
Field of study

The voice assistants popularized by smartphones are now the driving force behind a device that is making its way into homes in recent years: smart speakers. Since 2018, these devices are available in Brazilian Portuguese. These devices are also a new platform for news distribution and consumption. How does the platform define the content that will be delivered to the user? What challenges do content producers face? How does the user access this news? To try to find the answers to these questions, we conducted a literature review, a market situation point through business reports, developed an online survey with smart speaker users and also interviewed content producers. The answers show that there is influence of algorithms and the business model. An extra challenge for Portuguese content producers is the language itself. The voice assistant systems still have difficulty understanding words and expressions in Portuguese for users. This work may be helpful for content producers, especially Portuguese-speaking ones, to find ways to reach their audience.Os assistentes de voz popularizados pelos smartphones são agora o motor de um aparelho que está entrando nas casas nos últimos anos: os smart speakers. Desde 2018, Esses equipamentos estão disponíveis em português do Brasil. Tais aparelhos são também uma nova plataforma para distribuição e consumo de notícias. Como a plataforma define o conteúdo que será entregue ao usuário? Quais os desafios que os produtores de conteúdo enfrentam? Como o usuário acessa essas notícias? Para tentar encontrar as respostas a essas questões, fizemos uma revisão de literatura, um ponto de situação do mercado através de relatórios empresariais, desenvolvemos um inquérito online com usuários de smart speakers e também entrevistamos produtores de conteúdo. As respostas mostram que há influência dos algoritmos e do modelo de negócio. Um desafio extra para os produtores de conteúdo em português é a própria língua. Os sistemas dos assistentes de voz ainda apresentam dificuldade de compreensão de palavras e expressões em português para os usuários. Este trabalho poderá ser útil para produtores de conteúdo, especialmente de língua portuguesa, encontrarem maneiras de chegar ao público

Collective efficiency strategies: a policy instrument for the competitiveness of low-density territories

Author: Rui Nuno Baleiras
Publication venue
Publication date
Field of study

This paper motivates the focus of EU cohesion policy at large and the territorial cooperation tools on the economic development of territories featuring impoverishing growth associated to low population density. An innovative policy approach to help solving this problem in many Member States is put forward here. It is based on the economic concept of “collective efficiency”. It should be understood as a proposal to improve EU cohesion policy in the next programming period. As such, the paper suggests actual ideas to be included in the forthcoming Common Strategic Framework and Development and Investment Partnership Contracts.

Method for Reading Sensors and Controlling Actuators Using Audio Interfaces of Mobile Devices

Author: Ahn
Aquiles F. Burlamaqui
Chitode
Cooley
Davidson
Garcia
Goertzel
Khan
Luiz M. G. Gonçalves
Luk
Lyons
McComb
Rafael V. Aroca
Santos
Siegwart
Tan
van Bosse
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/02/2012
Field of study

This article presents a novel closed loop control architecture based on audio channels of several types of computing devices, such as mobile phones and tablet computers, but not restricted to them. The communication is based on an audio interface that relies on the exchange of audio tones, allowing sensors to be read and actuators to be controlled. As an application example, the presented technique is used to build a low cost mobile robot, but the system can also be used in a variety of mechatronics applications and sensor networks, where smartphones are the basic building blocks

CiteSeerX

Directory of Open Access Journals

Concurrent speech feedback for blind people on touchscreens

Author: Francisco Pedro André Mendes
Publication venue
Publication date: 01/01/2022
Field of study

Tese de Mestrado, Engenharia Informática, 2023, Universidade de Lisboa, Faculdade de CiênciasSmartphone interactions are demanding. Most smartphones come with limited physical buttons, so users can not rely on touch to guide them. Smartphones come with built-in accessibility mechanisms, for example, screen readers, that make the interaction accessible for blind users. However, some tasks are still inefficient or cumbersome. Namely, when scanning through a document, users are limited by the single sequential audio channel provided by screen readers. Or when tasks are interrupted in the presence of other actions. In this work, we explored alternatives to optimize smartphone interaction by blind people by leveraging simultaneous audio feedback with different configurations, such as different voices and spatialization. We researched 5 scenarios: Task interruption, where we use concurrent speech to reproduce a notification without interrupting the current task; Faster information consumption, where we leverage concurrent speech to announce up to 4 different contents simultaneously; Text properties, where the textual formatting is announced; The map scenario, where spatialization provides feedback on how close or distant a user is from a particular location; And smartphone interactions scenario, where there is a corresponding sound for each gesture, and instead of reading the screen elements (e.g., button), a corresponding sound is played. We conducted a study with 10 blind participants whose smartphone usage experience ranges from novice to expert. During the study, we asked participants’ perceptions and preferences for each scenario, what could be improved, and in what situations these extra capabilities are valuable to them. Our results suggest that these extra capabilities we presented are helpful for users, especially if these can be turned on and off according to the user’s needs and situation. Moreover, we find that using concurrent speech works best when announcing short messages to the user while listening to longer content and not so much to have lengthy content announced simultaneously

Universidade de Lisboa: Repositório.UL

Deep Learning for Distant Speech Recognition

Author: Ravanelli Mirco
Publication venue
Publication date: 15/12/2017
Field of study

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

arXiv.org e-Print Archive