44,967 research outputs found
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
An adaptive tracking observer for failure-detection systems
The design problem of adaptive observers applied to linear, constant and variable parameters, multi-input, multi-output systems, is considered. It is shown that, in order to keep the observer's (or Kalman filter) false-alarm rate (FAR) under a certain specified value, it is necessary to have an acceptable proper matching between the observer (or KF) model and the system parameters. An adaptive observer algorithm is introduced in order to maintain desired system-observer model matching, despite initial mismatching and/or system parameter variations. Only a properly designed adaptive observer is able to detect abrupt changes in the system (actuator, sensor failures, etc.) with adequate reliability and FAR. Conditions for convergence for the adaptive process were obtained, leading to a simple adaptive law (algorithm) with the possibility of an a priori choice of fixed adaptive gains. Simulation results show good tracking performance with small observer output errors and accurate and fast parameter identification, in both deterministic and stochastic cases
Sliding mode adaptive state observation for time-delay uncertain nonlinear systems
In this paper a method to design robust adaptive sliding mode observers (ASMO) for a class of nonlinear time- delay systems with uncertainties, is proposed. The objective is to achieve insensitivity and robustness of the proposed sliding mode observer to matched disturbances. A novel systematic design method is synthesized to solve matching conditions and compute observer stabilizing gains. The Lyapunov-Krasovskii theorem is employed to prove the ultimate stability with arbitrary boundedness radius of the estimation error of the proposed filter. Finally, the ability of ASMO for fault reconstruction is studied
Evaluation of bistable systems versus matched filters in detecting bipolar pulse signals
This paper presents a thorough evaluation of a bistable system versus a
matched filter in detecting bipolar pulse signals. The detectability of the
bistable system can be optimized by adding noise, i.e. the stochastic resonance
(SR) phenomenon. This SR effect is also demonstrated by approximate statistical
detection theory of the bistable system and corresponding numerical
simulations. Furthermore, the performance comparison results between the
bistable system and the matched filter show that (a) the bistable system is
more robust than the matched filter in detecting signals with disturbed pulse
rates, and (b) the bistable system approaches the performance of the matched
filter in detecting unknown arrival times of received signals, with an
especially better computational efficiency. These significant results verify
the potential applicability of the bistable system in signal detection field.Comment: 15 pages, 9 figures, MikTex v2.
A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise
The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943?954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model?s ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds
Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models
Common noise compensation techniques use vector Taylor series (VTS) to approximate the mismatch function. Recent work shows that the approximation accuracy may be improved by sampling. One such sampling technique is the unscented transform (UT), which draws samples deterministically from clean speech and noise model to derive the noise corrupted speech parameters. This paper applies UT to noise compensation of the subspace Gaussian mixture model (SGMM). Since UT requires relatively smaller number of samples for accurate estimation, it has significantly lower computational cost compared to other random sampling techniques. However, the number of surface Gaussians in an SGMM is typically very large, making the direct application of UT, for compensating individual Gaussian components, computationally impractical. In this paper, we avoid the computational burden by employing UT in the framework of joint uncertainty decoding (JUD), which groups all the Gaussian components into small number of classes, sharing the compensation parameters by class. We evaluate the JUD-UT technique for an SGMM system using the Aurora 4 corpus. Experimental results indicate that UT can lead to increased accuracy compared to VTS approximation if the JUD phase factor is untuned, and to similar accuracy if the phase factor is tuned empirically. 1
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce two noise adaptation
techniques, respectively integrating short-term (60 milliseconds) and long-term
(30 minutes) context. First, we apply per-channel energy normalization (PCEN)
in the time-frequency domain, which applies short-term automatic gain control
to every subband in the mel-frequency spectrogram. Secondly, we replace the
last dense layer in the network by a context-adaptive neural network (CA-NN)
layer. Combining them yields state-of-the-art results that are unmatched by
artificial data augmentation alone. We release a pre-trained version of our
best performing system under the name of BirdVoxDetect, a ready-to-use detector
of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019;
revised August 2019; published October 201
A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition
This article provides a unifying Bayesian network view on various approaches
for acoustic model adaptation, missing feature, and uncertainty decoding that
are well-known in the literature of robust automatic speech recognition. The
representatives of these classes can often be deduced from a Bayesian network
that extends the conventional hidden Markov models used in speech recognition.
These extensions, in turn, can in many cases be motivated from an underlying
observation model that relates clean and distorted feature vectors. By
converting the observation models into a Bayesian network representation, we
formulate the corresponding compensation rules leading to a unified view on
known derivations as well as to new formulations for certain approaches. The
generic Bayesian perspective provided in this contribution thus highlights
structural differences and similarities between the analyzed approaches
- …