Search CORE

2,614 research outputs found

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Towards Natural Human Control and Navigation of Autonomous Wheelchairs

Author: Echefu Samuel
Publication venue: RIT Scholar Works
Publication date: 01/05/2016
Field of study

Approximately 2.2 million people in the United States depend on a wheelchair to assist with their mobility. Often times, the wheelchair user can maneuver around using a conventional joystick. Visually impaired or wheelchair patients with restricted hand mobility, such as stroke, arthritis, limb injury, Parkinson’s, cerebral palsy or multiple sclerosis, prevent them from using traditional joystick controls. The resulting mobility limitations force these patients to rely on caretakers to perform everyday tasks. This minimizes the independence of the wheelchair user. Modern day speech recognition systems can be used to enhance user experiences when using electronic devices. By expanding the motorized wheelchair control interface to include the detection of user speech commands, the independence is given back to the mobility impaired. A speech recognition interface was developed for a smart wheelchair. By integrating navigation commands with a map of the wheelchair’s surroundings, the wheelchair interface is more natural and intuitive to use. Complex speech patterns are interpreted for users to command the smart wheelchair to navigate to specified locations within the map. Pocketsphinx, a speech toolkit, is used to interpret the vocal commands. A language model and dictionary were generated based on a set of possible commands and locations supplied to the speech recognition interface. The commands fall under the categories of speed, directional, or destination commands. Speed commands modify the relative speed of the wheelchair. Directional commands modify the relative direction of the wheelchair. Destination commands require a known location on a map to navigate to. The completion of the speech input processer and the connection between wheelchair components via the Robot Operating System make map navigation possible

RIT Scholar Works

Efficient training algorithms for HMMs using incremental estimation

Author: Gotoh Y.
Hochberg M.M.
Silverman H.F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

Typically, parameter estimation for a hidden Markov model (HMM) is performed using an expectation-maximization (EM) algorithm with the maximum-likelihood (ML) criterion. The EM algorithm is an iterative scheme that is well-defined and numerically stable, but convergence may require a large number of iterations. For speech recognition systems utilizing large amounts of training material, this results in long training times. This paper presents an incremental estimation approach to speed-up the training of HMMs without any loss of recognition performance. The algorithm selects a subset of data from the training set, updates the model parameters based on the subset, and then iterates the process until convergence of the parameters. The advantage of this approach is a substantial increase in the number of iterations of the EM algorithm per training token, which leads to faster training. In order to achieve reliable estimation from a small fraction of the complete data set at each iteration, two training criteria are studied; ML and maximum a posteriori (MAP) estimation. Experimental results show that the training of the incremental algorithms is substantially faster than the conventional (batch) method and suffers no loss of recognition performance. Furthermore, the incremental MAP based training algorithm improves performance over the batch versio

Crossref

White Rose Research Online

A weighted MVDR beamformer based on SVM learning for sound source localization

Author: Carlo Drioli
Daniele Salvati
Gian Luca Foresti
Publication venue: 'Elsevier BV'
Publication date: 07/09/2016
Field of study

3noA weighted minimum variance distortionless response (WMVDR) algorithm for near-field sound localization in a reverberant environment is presented. The steered response power computation of the WMVDR is based on a machine learning component which improves the incoherent frequency fusion of the narrowband power maps. A support vector machine (SVM) classifier is adopted to select the components of the fusion. The skewness measure of the narrowband power map marginal distribution is showed to be an effective feature for the supervised learning of the power map selection. Experiments with both simulated and real data demonstrate the improvement of the WMVDR beamformer localization accuracy with respect to other state-of-the-art techniques.partially_openopenSalvati, Daniele; Drioli, Carlo; Foresti, Gian LucaSalvati, Daniele; Drioli, Carlo; Foresti, Gian Luc

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

Author: Chang Xuankai
Cornell Samuele
Garcia Paola
Khudanpur Sanjeev
Maciejewski Matthew
Masuyama Yoshiki
Raj Desh
Squartini Stefano
Wang Zhong-Qiu
Watanabe Shinji
Wiesner Matthew
Publication venue
Publication date: 14/07/2023
Field of study

The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The goal is for participants to devise a single system that can generalize across different array geometries and use cases with no a-priori information. Another departure from earlier CHiME iterations is that participants are allowed to use open-source pre-trained models and datasets. In this paper, we describe the challenge design, motivation, and fundamental research questions in detail. We also present the baseline system, which is fully array-topology agnostic and features multi-channel diarization, channel selection, guided source separation and a robust ASR model that leverages self-supervised speech representations (SSLR)

arXiv.org e-Print Archive

Developmental Robots - A New Paradigm

Author: Weng Juyang
Zhang Yilu
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2002
Field of study

It has been proved to be extremely challenging for humans to program a robot to such a sufficient degree that it acts properly in a typical unknown human environment. This is especially true for a humanoid robot due to the very large number of redundant degrees of freedom and a large number of sensors that are required for a humanoid to work safely and effectively in the human environment. How can we address this fundamental problem? Motivated by human mental development from infancy to adulthood, we present a theory, an architecture, and some experimental results showing how to enable a robot to develop its mind automatically, through online, real time interactions with its environment. Humans mentally “raise” the robot through “robot sitting” and “robot schools” instead of task-specific robot programming

CiteSeerX

Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

Author: Ong Jonah Soon Xuan
Publication venue: Curtin University
Publication date: 01/01/2021
Field of study

The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources

espace@Curtin