855 research outputs found

    Digital Signal Processing Research Program

    Get PDF
    Contains table of contents for Section 2, an introduction, reports on twenty research projects and a list of publications.Lockheed Sanders, Inc. Contract BZ4962U.S. Army Research Laboratory Grant QK-8819U.S. Navy - Office of Naval Research Grant N00014-93-1-0686National Science Foundation Grant MIP 95-02885U.S. Navy - Office of Naval Research Grant N00014-95-1-0834U.S. Navy - Office of Naval Research Grant N00014-96-1-0930U.S. Navy - Office of Naval Research Grant N00014-95-1-0362National Defense Science and Engineering FellowshipU.S. Air Force - Office of Scientific Research Grant F49620-96-1-0072National Science Foundation Graduate Research Fellowship Grant MIP 95-02885Lockheed Sanders, Inc. Grant N00014-93-1-0686National Science Foundation Graduate FellowshipU.S. Army Research Laboratory/ARL Advanced Sensors Federated Lab Program Contract DAAL01-96-2-000

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

    Get PDF
    Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been tackled using signal processing and machine learning techniques applied to the available acoustic signals. Since the visual aspect of speech is essentially unaffected by the acoustic environment, visual information from the target speakers, such as lip movements and facial expressions, has also been used for speech enhancement and speech separation systems. In order to efficiently fuse acoustic and visual information, researchers have exploited the flexibility of data-driven approaches, specifically deep learning, achieving strong performance. The ceaseless proposal of a large number of techniques to extract features and fuse multimodal information has highlighted the need for an overview that comprehensively describes and discusses audio-visual speech enhancement and separation based on deep learning. In this paper, we provide a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets and objective functions. In addition, we review deep-learning-based methods for speech reconstruction from silent videos and audio-visual sound source separation for non-speech signals, since these methods can be more or less directly applied to audio-visual speech enhancement and separation. Finally, we survey commonly employed audio-visual speech datasets, given their central role in the development of data-driven approaches, and evaluation methods, because they are generally used to compare different systems and determine their performance

    Monitoring system for long-distance pipelines subject to destructive attack

    Get PDF
    In an era of terrorism, it is important to protect critical pipeline infrastructure, especially in countries where life is strongly dependent on water and the economy on oil and gas. Structural health monitoring (SHM) using acoustic waves is one of the common solutions. However, considerable prior work has shown that pipes are cylindrical acoustic waveguides that support many dispersive, lossy modes; only the torsional T(0, 1) mode has zero dispersion. Although suitable transducers have been developed, these typically excite several modes, and even if they do not, bends and supports induce mode conversion. Moreover, the high-power transducers that could in principle be used to overcome noise and attenuation in long distance pipes present an obvious safety hazard with volatile products, making it difficult to distinguish signals and extract pipeline status information. The problem worsens as the pipe diameter increases or as the frequency rises (due to the increasing number of modes), if the pipe is buried (due to rising attenuation), or if the pipe carries a flowing product (because of additional acoustic noise). Any system is therefore likely to be short-range. This research proposes the use of distributed active sensor network to monitor long-range pipelines, by verifying continuity and sensing small disturbances. A 4-element cuboid Electromagnetic Acoustic Transducer (EMAT) is used to excite the longitudinal L(0,1) mode. Although the EMAT also excites other slower modes, long distance propagation allows their effects to be separated. Correlation detection is exploited to enhance signal-to-noise ratio (SNR), and code division multiplexing access (CDMA) is used to distinguish between nodes in a multi-node system. An extensive numerical search for multiphase quasi-orthogonal codes for different user numbers is conducted. The results suggest that side lobes degrade performance even with the highest possible discrimination factor. Golay complementary pairs (which can eliminate the side lobes completely, albeit at the price of a considerable reduction in speed) are therefore investigated as an alternative. Pipeline systems are first reviewed. Acoustic wave propagation is described using standard theory and a freeware modeling package. EMAT modeling is carried out by numerical calculation of electromagnetic fields. Signal propagation is investigated theoretically using a full system simulator that allows frequency-domain description of transducers, dispersion, multi-mode propagation, mode conversion and multiple reflections. Known codes for multiplexing are constructed using standard algorithms, and novel codes are discovered by an efficient directed search. Propagation of these codes in a dispersive system is simulated. Experiments are carried out using small, unburied air-filled copper pipes in a frequency range where the number of modes is small, and the attenuation and noise are low. Excellent agreement is obtained between theory and experiment. The propagation of pulses and multiplexed codes over distances up to 200 m are successfully demonstrated, and status changes introduced by removable reflectors are detected.Open Acces

    An Online Solution for Localisation, Tracking and Separation of Moving Speech Sources

    Get PDF
    The problem of separating a time varying number of speech sources in a room is difficult to solve. The challenge lies in estimating the number and the location of these speech sources. Furthermore, the tracked speech sources need to be separated. This thesis proposes a solution which utilises the Random Finite Set approach to estimate the number and location of these speech sources and subsequently separate the speech source mixture via time frequency masking

    Implementation and evaluation of a low complexity microphone array for speaker recognition

    Get PDF
    Includes bibliographical references (leaves 83-86).This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field
    • …
    corecore