855 research outputs found
Digital Signal Processing Research Program
Contains table of contents for Section 2, an introduction, reports on twenty research projects and a list of publications.Lockheed Sanders, Inc. Contract BZ4962U.S. Army Research Laboratory Grant QK-8819U.S. Navy - Office of Naval Research Grant N00014-93-1-0686National Science Foundation Grant MIP 95-02885U.S. Navy - Office of Naval Research Grant N00014-95-1-0834U.S. Navy - Office of Naval Research Grant N00014-96-1-0930U.S. Navy - Office of Naval Research Grant N00014-95-1-0362National Defense Science and Engineering FellowshipU.S. Air Force - Office of Scientific Research Grant F49620-96-1-0072National Science Foundation Graduate Research Fellowship Grant MIP 95-02885Lockheed Sanders, Inc. Grant N00014-93-1-0686National Science Foundation Graduate FellowshipU.S. Army Research Laboratory/ARL Advanced Sensors Federated Lab Program Contract DAAL01-96-2-000
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Speech enhancement and speech separation are two related tasks, whose purpose
is to extract either one or more target speech signals, respectively, from a
mixture of sounds generated by several sources. Traditionally, these tasks have
been tackled using signal processing and machine learning techniques applied to
the available acoustic signals. Since the visual aspect of speech is
essentially unaffected by the acoustic environment, visual information from the
target speakers, such as lip movements and facial expressions, has also been
used for speech enhancement and speech separation systems. In order to
efficiently fuse acoustic and visual information, researchers have exploited
the flexibility of data-driven approaches, specifically deep learning,
achieving strong performance. The ceaseless proposal of a large number of
techniques to extract features and fuse multimodal information has highlighted
the need for an overview that comprehensively describes and discusses
audio-visual speech enhancement and separation based on deep learning. In this
paper, we provide a systematic survey of this research topic, focusing on the
main elements that characterise the systems in the literature: acoustic
features; visual features; deep learning methods; fusion techniques; training
targets and objective functions. In addition, we review deep-learning-based
methods for speech reconstruction from silent videos and audio-visual sound
source separation for non-speech signals, since these methods can be more or
less directly applied to audio-visual speech enhancement and separation.
Finally, we survey commonly employed audio-visual speech datasets, given their
central role in the development of data-driven approaches, and evaluation
methods, because they are generally used to compare different systems and
determine their performance
Recommended from our members
Modelling Of Sound Attenuation By Periodic, Rectangular Structures
The problem of noise reduction in outdoor environments is the subject of significant research effort because of the wide reaching impact it has on the population, especially those in urban areas. From existing outdoor and laboratory data it has already been established that periodic structures embedded into the ground may be tuned to significantly reduce transported noise at nuisance frequencies. Such structures possessing rectangular section are of particular interest due to the relative ease with which they can be implemented. While significant data is available, it has traditionally been a time consuming task to model and simulate such structures due to the need to apply complex numerical methods to do so. Finding simplified modelling methods is the aim of this research.
Modelling of these structures will be considered in detail, culminating in the presentation of a novel analytic model with application in practical acoustic engineering and environmental planning. Existing general methods are explored before moving on to consider simpler techniques which may be employed by applying the simplifying assumption that all cavities within the periodic structure are rectangular in nature. By considering the structure as an effective impedance a novel analytic model is presented to conclude the thesis
Monitoring system for long-distance pipelines subject to destructive attack
In an era of terrorism, it is important to protect critical pipeline infrastructure, especially in countries where life is strongly dependent on water and the economy on oil and gas. Structural health monitoring (SHM) using acoustic waves is one of the common solutions. However, considerable prior work has shown that pipes are cylindrical acoustic waveguides that support many dispersive, lossy modes; only the torsional T(0, 1) mode has zero dispersion. Although suitable transducers have been developed, these typically excite several modes, and even if they do not, bends and supports induce mode conversion. Moreover, the high-power transducers that could in principle be used to overcome noise and attenuation in long distance pipes present an obvious safety hazard with volatile products, making it difficult to distinguish signals and extract pipeline status information. The problem worsens as the pipe diameter increases or as the frequency rises (due to the increasing number of modes), if the pipe is buried (due to rising attenuation), or if the pipe carries a flowing product (because of additional acoustic noise). Any system is therefore likely to be short-range.
This research proposes the use of distributed active sensor network to monitor long-range pipelines, by verifying continuity and sensing small disturbances. A 4-element cuboid Electromagnetic Acoustic Transducer (EMAT) is used to excite the longitudinal L(0,1) mode. Although the EMAT also excites other slower modes, long distance propagation allows their effects to be separated. Correlation detection is exploited to enhance signal-to-noise ratio (SNR), and code division multiplexing access (CDMA) is used to distinguish between nodes in a multi-node system. An extensive numerical search for multiphase quasi-orthogonal codes for different user numbers is conducted. The results suggest that side lobes degrade performance even with the highest possible discrimination factor. Golay complementary pairs (which can eliminate the side lobes completely, albeit at the price of a considerable reduction in speed) are therefore investigated as an alternative.
Pipeline systems are first reviewed. Acoustic wave propagation is described using standard theory and a freeware modeling package. EMAT modeling is carried out by numerical calculation of electromagnetic fields. Signal propagation is investigated theoretically using a full system simulator that allows frequency-domain description of transducers, dispersion, multi-mode propagation, mode conversion and multiple reflections. Known codes for multiplexing are constructed using standard algorithms, and novel codes are discovered by an efficient directed search. Propagation of these codes in a dispersive system is simulated. Experiments are carried out using small, unburied air-filled copper pipes in a frequency range where the number of modes is small, and the attenuation and noise are low. Excellent agreement is obtained between theory and experiment. The propagation of pulses and multiplexed codes over distances up to 200 m are successfully demonstrated, and status changes introduced by removable reflectors are detected.Open Acces
An Online Solution for Localisation, Tracking and Separation of Moving Speech Sources
The problem of separating a time varying number of speech sources in a room is difficult to solve. The challenge lies in estimating the number and the location of these speech sources. Furthermore, the tracked speech sources need to be separated. This thesis proposes a solution which utilises the Random Finite Set approach to estimate the number and location of these speech sources and subsequently separate the speech source mixture via time frequency masking
Implementation and evaluation of a low complexity microphone array for speaker recognition
Includes bibliographical references (leaves 83-86).This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field
- …