Search CORE

2,863 research outputs found

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

Author: Erdogan Hakan
Hershey John R.
Meng Zhong
Watanabe Shinji
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2017
Field of study

Far-field speech recognition in noisy and reverberant conditions remains a challenging problem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and performing beamforming over them. In this paper, we propose to use a recurrent neural network with long short-term memory (LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions which results in a set of timevarying room impulse responses. The LSTM adaptive beamformer is jointly trained with a deep LSTM acoustic model to predict senone labels. Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients. The proposed system achieves 7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

arXiv.org e-Print Archive

Crossref

Recommended from our members

GNSS Signal Authentication via Power and Distortion Monitoring

Author: Evans Brian L.
Gross Jason N.
Humphreys Todd E.
Wetson Kyle D.
Publication venue
Publication date: 01/01/2017
Field of study

We propose a simple low-cost technique that enables civil Global Positioning System (GPS) receivers and other civil global navigation satellite system (GNSS) receivers to reliably detect carry-off spoofing and jamming. The technique, which we call the Power-Distortion detector, classifies received signals as interference-free, multipath-afflicted, spoofed, or jammed according to observations of received power and correlatio n function distortion. It does not depend on external hardware or a network connection and can be readily implemented on many receivers via a firmware update. Crucially, the detector can with high probability distinguish low-power spoofing from ordinary multipath. In testing against over 25 high-quality empirical data sets yielding over 900,000 separate detection tests, the detector correctly alarms on all malicious spoofing or jamming attack s while maintaining a <0.5% single-channel false alarm rate.Aerospace Engineering and Engineering Mechanic

Texas ScholarWorks

Features of hearing: applications of machine learning to uncover the building blocks of hearing

Author: Weerts Lotte
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/11/2021
Field of study

Recent advances in machine learning have instigated a renewed interest in using machine learning approaches to better understand human sensory processing. This line of research is particularly interesting for speech research since speech comprehension is uniquely human, which complicates obtaining detailed neural recordings. In this thesis, I explore how machine learning can be used to uncover new knowledge about the auditory system, with a focus on discovering robust auditory features. The resulting increased understanding of the noise robustness of human hearing may help to better assist those with hearing loss and improve Automatic Speech Recognition (ASR) systems. First, I show how computational neuroscience and machine learning can be combined to generate hypotheses about auditory features. I introduce a neural feature detection model with a modest number of parameters that is compatible with auditory physiology. By testing feature detector variants in a speech classification task, I confirm the importance of both well-studied and lesser-known auditory features. Second, I investigate whether ASR software is a good candidate model of the human auditory system. By comparing several state-of-the-art ASR systems to the results from humans on a range of psychometric experiments, I show that these ASR systems diverge markedly from humans in at least some psychometric tests. This implies that none of these systems act as a strong proxy for human speech recognition, although some may be useful when asking more narrowly defined questions. For neuroscientists, this thesis exemplifies how machine learning can be used to generate new hypotheses about human hearing, while also highlighting the caveats of investigating systems that may work fundamentally differently from the human brain. For machine learning engineers, I point to tangible directions for improving ASR systems. To motivate the continued cross-fertilization between these fields, a toolbox that allows researchers to assess new ASR systems has been released.Open Acces

Spiral - Imperial College Digital Repository

Recommended from our members

Strategies for Devising Automatic Signal Recognition Algorithms in a Shared Radio Environment

Author: Wagstaff Adrian John
Publication venue
Publication date: 01/01/2011
Field of study

In an increasingly congested and complex radio environment interference is to be expected, which poses problems for Automatic Signal Recognition (ASR) systems. This thesis explores strategies for improving ASR performance in the presence of interference. The thesis breaks the overall research question down into a number of subquestions and explores each of these in turn. A Phase-symmetric Cross Recurrence Plot is developed and used to show how a radio signal can be manipulated to separate information about the modulation from the information being carried. The Logarithmic Cyclic frequency Domain Profile is introduced to illustrate how a logarithmic representation can be used for analysing mixtures of signals with very different cyclic frequencies. After defining a canonical ASR system architecture, the concepts of an Ideal Feature and Interference Selectivity are introduced and applied to typical features used in ASR processing. Finally it is shown how these algorithmic developments can be combined in a Bayesian chain implementation that can accommodate a wide variety of feature extraction algorithms. It is concluded that future ASR systems will require features that can handle a wide range of signal types with much higher levels of interference selectivity if they are to achieve acceptable performance in shared spectrum bands. Intelligent segmentation is shown to be a requirement for future ASR systems unless features can be developed that have near ideal performance

Open Research Online (The Open University)

OpenGrey Repository

Lookup tables to compute high energy cosmic ray induced atmospheric ionization and changes in atmospheric chemistry

Author: Adrian L Melott
B.C. Thomas .
B.D. Fields
Brian C Thomas
C.D. Dermer
C.H. Jackman
Dimitra Atri
L.M. Ejzak
M.V. Medvedev
N. Gehrels .
T. Pierog
T. Pierog
Publication venue: 'IOP Publishing'
Publication date: 03/05/2010
Field of study

A variety of events such as gamma-ray bursts and supernovae may expose the Earth to an increased flux of high-energy cosmic rays, with potentially important effects on the biosphere. Existing atmospheric chemistry software does not have the capability of incorporating the effects of substantial cosmic ray flux above 10 GeV . An atmospheric code, the NASA-Goddard Space Flight Center two-dimensional (latitude, altitude) time-dependent atmospheric model (NGSFC), is used to study atmospheric chemistry changes. Using CORSIKA, we have created tables that can be used to compute high energy cosmic ray (10 GeV - 1 PeV) induced atmospheric ionization and also, with the use of the NGSFC code, can be used to simulate the resulting atmospheric chemistry changes. We discuss the tables, their uses, weaknesses, and strengths.Comment: In press: Journal of Cosmology and Astroparticle Physics. 6 figures, 3 tables, two associated data files. Major revisions, including results of a greatly expanded computation, clarification and updated references. In the future we will expand the table to at least EeV levels

arXiv.org e-Print Archive

Crossref