Search CORE

855 research outputs found

Digital Signal Processing Research Program

Author: Baggeroer Arthur B.
Barron Richard J.
Beheshti Soosan
Buck John R.
Chandrakasan Anantha P.
Chen Brian
Eggen Trym H.
Hadjicostis Christoforos N.
Lam Warren M.
Laneman J. Nicholas
Lee Li
Ludwig Jeffrey T.
Nawab S. Hamid
Ooi James M.
Oppenheim Alan V.
Papadopoulos Haralabos C.
Rabiner Wendi B.
Secor Matthew J.
Seefeldt Alan J.
Singer Andrew C.
Verbout Shawn M.
Verghese George C.
Wage Kathleen E.
Wang Alex Che-Wei
Wornell Gregory W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Section 2, an introduction, reports on twenty research projects and a list of publications.Lockheed Sanders, Inc. Contract BZ4962U.S. Army Research Laboratory Grant QK-8819U.S. Navy - Office of Naval Research Grant N00014-93-1-0686National Science Foundation Grant MIP 95-02885U.S. Navy - Office of Naval Research Grant N00014-95-1-0834U.S. Navy - Office of Naval Research Grant N00014-96-1-0930U.S. Navy - Office of Naval Research Grant N00014-95-1-0362National Defense Science and Engineering FellowshipU.S. Air Force - Office of Scientific Research Grant F49620-96-1-0072National Science Foundation Graduate Research Fellowship Grant MIP 95-02885Lockheed Sanders, Inc. Grant N00014-93-1-0686National Science Foundation Graduate FellowshipU.S. Army Research Laboratory/ARL Advanced Sensors Federated Lab Program Contract DAAL01-96-2-000

DSpace@MIT

Deep Learning for Distant Speech Recognition

Author: Ravanelli Mirco
Publication venue
Publication date: 15/12/2017
Field of study

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

arXiv.org e-Print Archive

Unitn-eprints PhD

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Author: Jensen Jesper
Michelsanti Daniel
Tan Zheng-Hua
Xu Yong
Yu Dong
Yu Meng
Zhang Shi-Xiong
Publication venue
Publication date: 01/01/2021
Field of study

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been tackled using signal processing and machine learning techniques applied to the available acoustic signals. Since the visual aspect of speech is essentially unaffected by the acoustic environment, visual information from the target speakers, such as lip movements and facial expressions, has also been used for speech enhancement and speech separation systems. In order to efficiently fuse acoustic and visual information, researchers have exploited the flexibility of data-driven approaches, specifically deep learning, achieving strong performance. The ceaseless proposal of a large number of techniques to extract features and fuse multimodal information has highlighted the need for an overview that comprehensively describes and discusses audio-visual speech enhancement and separation based on deep learning. In this paper, we provide a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets and objective functions. In addition, we review deep-learning-based methods for speech reconstruction from silent videos and audio-visual sound source separation for non-speech signals, since these methods can be more or less directly applied to audio-visual speech enhancement and separation. Finally, we survey commonly employed audio-visual speech datasets, given their central role in the development of data-driven approaches, and evaluation methods, because they are generally used to compare different systems and determine their performance

arXiv.org e-Print Archive

VBN

Design of Low-Cost FPGA Hardware for Real-time ICA-Based Blind Source Separation Algorithm

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Recommended from our members

Modelling Of Sound Attenuation By Periodic, Rectangular Structures

Author: Mellish Stephen James
Publication venue
Publication date: 12/05/2020
Field of study

The problem of noise reduction in outdoor environments is the subject of significant research effort because of the wide reaching impact it has on the population, especially those in urban areas. From existing outdoor and laboratory data it has already been established that periodic structures embedded into the ground may be tuned to significantly reduce transported noise at nuisance frequencies. Such structures possessing rectangular section are of particular interest due to the relative ease with which they can be implemented. While significant data is available, it has traditionally been a time consuming task to model and simulate such structures due to the need to apply complex numerical methods to do so. Finding simplified modelling methods is the aim of this research. Modelling of these structures will be considered in detail, culminating in the presentation of a novel analytic model with application in practical acoustic engineering and environmental planning. Existing general methods are explored before moving on to consider simpler techniques which may be employed by applying the simplifying assumption that all cavities within the periodic structure are rectangular in nature. By considering the structure as an effective impedance a novel analytic model is presented to conclude the thesis

Open Research Online (The Open University)

Monitoring system for long-distance pipelines subject to destructive attack

Author: Noorwali Aiman
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/12/2022
Field of study

In an era of terrorism, it is important to protect critical pipeline infrastructure, especially in countries where life is strongly dependent on water and the economy on oil and gas. Structural health monitoring (SHM) using acoustic waves is one of the common solutions. However, considerable prior work has shown that pipes are cylindrical acoustic waveguides that support many dispersive, lossy modes; only the torsional T(0, 1) mode has zero dispersion. Although suitable transducers have been developed, these typically excite several modes, and even if they do not, bends and supports induce mode conversion. Moreover, the high-power transducers that could in principle be used to overcome noise and attenuation in long distance pipes present an obvious safety hazard with volatile products, making it difficult to distinguish signals and extract pipeline status information. The problem worsens as the pipe diameter increases or as the frequency rises (due to the increasing number of modes), if the pipe is buried (due to rising attenuation), or if the pipe carries a flowing product (because of additional acoustic noise). Any system is therefore likely to be short-range. This research proposes the use of distributed active sensor network to monitor long-range pipelines, by verifying continuity and sensing small disturbances. A 4-element cuboid Electromagnetic Acoustic Transducer (EMAT) is used to excite the longitudinal L(0,1) mode. Although the EMAT also excites other slower modes, long distance propagation allows their effects to be separated. Correlation detection is exploited to enhance signal-to-noise ratio (SNR), and code division multiplexing access (CDMA) is used to distinguish between nodes in a multi-node system. An extensive numerical search for multiphase quasi-orthogonal codes for different user numbers is conducted. The results suggest that side lobes degrade performance even with the highest possible discrimination factor. Golay complementary pairs (which can eliminate the side lobes completely, albeit at the price of a considerable reduction in speed) are therefore investigated as an alternative. Pipeline systems are first reviewed. Acoustic wave propagation is described using standard theory and a freeware modeling package. EMAT modeling is carried out by numerical calculation of electromagnetic fields. Signal propagation is investigated theoretically using a full system simulator that allows frequency-domain description of transducers, dispersion, multi-mode propagation, mode conversion and multiple reflections. Known codes for multiplexing are constructed using standard algorithms, and novel codes are discovered by an efficient directed search. Propagation of these codes in a dispersive system is simulated. Experiments are carried out using small, unburied air-filled copper pipes in a frequency range where the number of modes is small, and the attenuation and noise are low. Excellent agreement is obtained between theory and experiment. The propagation of pulses and multiplexed codes over distances up to 200 m are successfully demonstrated, and status changes introduced by removable reflectors are detected.Open Acces

Spiral - Imperial College Digital Repository

An Online Solution for Localisation, Tracking and Separation of Moving Speech Sources

Author: Chong Nicholas Ewe Hai
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

The problem of separating a time varying number of speech sources in a room is difficult to solve. The challenge lies in estimating the number and the location of these speech sources. Furthermore, the tracked speech sources need to be separated. This thesis proposes a solution which utilises the Random Finite Set approach to estimate the number and location of these speech sources and subsequently separate the speech source mixture via time frequency masking

espace@Curtin

Implementation and evaluation of a low complexity microphone array for speaker recognition

Author: Zulu Peleira Nicholas
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2005
Field of study

Includes bibliographical references (leaves 83-86).This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field

Cape Town University OpenUCT

Free-field reciprocity calibration of laboratory standard (LS) microphones using a time selective technique

Author: Barrera Figueroa Salvador
Rasmussen Knud
Publication venue
Publication date: 01/01/2006
Field of study

Crossref

Online Research Database In Technology