Search CORE

4,925 research outputs found

Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Peláez Moreno Carmen
Vicente Peña Jesús de
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

Optimal set of EEG features for emotional state classification and trajectory visualization in Parkinson's disease

Author: Ibrahim Norlinah Mohamed
Mohamad Khairiyah
Murugappan Murugappan
Omar Mohd Iqbal
Palaniappan Ramaswamy
Sundaraj Kenneth
Yuvaraj Rajamanickam
Publication venue: 'Elsevier BV'
Publication date: 31/07/2014
Field of study

In addition to classic motor signs and symptoms, individuals with Parkinson's disease (PD) are characterized by emotional deficits. Ongoing brain activity can be recorded by electroencephalograph (EEG) to discover the links between emotional states and brain activity. This study utilized machine-learning algorithms to categorize emotional states in PD patients compared with healthy controls (HC) using EEG. Twenty non-demented PD patients and 20 healthy age-, gender-, and education level-matched controls viewed happiness, sadness, fear, anger, surprise, and disgust emotional stimuli while fourteen-channel EEG was being recorded. Multimodal stimulus (combination of audio and visual) was used to evoke the emotions. To classify the EEG-based emotional states and visualize the changes of emotional states over time, this paper compares four kinds of EEG features for emotional state classification and proposes an approach to track the trajectory of emotion changes with manifold learning. From the experimental results using our EEG data set, we found that (a) bispectrum feature is superior to other three kinds of features, namely power spectrum, wavelet packet and nonlinear dynamical analysis; (b) higher frequency bands (alpha, beta and gamma) play a more important role in emotion activities than lower frequency bands (delta and theta) in both groups and; (c) the trajectory of emotion changes can be visualized by reducing subject-independent features with manifold learning. This provides a promising way of implementing visualization of patient's emotional state in real time and leads to a practical system for noninvasive assessment of the emotional impairments associated with neurological disorders

Kent Academic Repository

New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

Author: Saeed Gazor
Sanaz Seyedin
Seyed Mohammad Ahadi
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions

Crossref

Directory of Open Access Journals

Wavelet-based techniques for speech recognition

Author: Omar Farooq (7204418)
Publication venue
Publication date: 01/01/2002
Field of study

In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

Loughborough University Institutional Repository

Convolutional Radio Modulation Recognition Networks

Author: C Clancy
E Blossom
J Mitola III
PJ Kolodzy
WA Gardner
Publication venue
Publication date: 10/06/2016
Field of study

We study the adaptation of convolutional neural networks to the complex temporal radio signal domain. We compare the efficacy of radio modulation classification using naively learned features against using expert features which are widely used in the field today and we show significant performance improvements. We show that blind temporal learning on large and densely encoded time series using deep convolutional neural networks is viable and a strong candidate approach for this task especially at low signal to noise ratio

arXiv.org e-Print Archive

Crossref

Distributed Speech Recognition

Author: Burget L.
Grezl F.
Jain P.
Motlicek P.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/12/2002
Field of study

This article discusses possibilities of integrating speech technology into wireless technology, allowing voice input for wireless devices. Distributed speech recognition concept and activities related to its standardization are presented. First ETSI DSR MFCC based standard is described. Work on its extension to improve robustness resulting in new standard is also presented

Directory of Open Access Journals

Digital library of Brno University of Technology

Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR

Author: Bourlard Hervé
McCowan Iain A.
Misra Hemant
Tyagi Vivek
Publication venue
Publication date: 10/03/2006
Field of study

In this paper, we present new dynamic features derived from the modulation spectrum of the cepstral traje ctories of the speech signal. Cepstral trajectories are projected over the basis of sines and cosines yie lding the cepstral modulation frequency response of the speech signal. We show that the different sines a nd cosines basis vectors select different modulation frequencies, whereas, the frequency responses of the delta and the double delta filters are only centered over 15Hz. Therefore, projecting cepstral trajector ies over the basis of sines and cosines yield a more complementary and discriminative range of features. In this work, the cepstrum reconstructed from the lower cepstral modulation frequency components is used as the static feature. In experiments, it is shown that, as well as providing an improvement in clean co nditions, these new dynamic features yield a significant increase in the speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and C-JRASTA PLP features

Infoscience - École polytechnique fédérale de Lausanne