5,106 research outputs found

    DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

    Full text link
    Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation vector is available. Typical estimation procedures of the correlation matrices and the speech interframe correlation (IFC) vector require an estimate of the speech presence probability (SPP) in each time-frequency bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate a speech mask and a noise mask for each time-frequency bin, using which two different SPP estimates are derived. Aiming at achieving a robust performance, the DNN is trained for various noise types and signal-to-noise ratios. Experimental results show that the multi-frame MVDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based estimator

    Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks

    Get PDF

    NASA Tech Briefs Index, 1977, volume 2, numbers 1-4

    Get PDF
    Announcements of new technology derived from the research and development activities of NASA are presented. Abstracts, and indexes for subject, personal author, originating center, and Tech Brief number are presented for 1977

    Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

    Get PDF
    This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

    Multimodal person recognition for human-vehicle interaction

    Get PDF
    Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies

    The CAST experiment at CERN

    Get PDF

    Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering

    Full text link
    This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF). The multichannel CTFs are adaptively identified based on the cross-relation method, and using the recursive least square criterion. Instead of the complex-valued CTF convolution model, we use a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude, which is just a coarse approximation of the former model, but is shown to be more robust against the CTF perturbations. Based on this nonnegative model, we propose an online STFT magnitude inverse filtering method. The inverse filters of the CTF magnitude are formulated based on the multiple-input/output inverse theorem (MINT), and adaptively estimated based on the gradient descent criterion. Finally, the inverse filtering is applied to the STFT magnitude of the microphone signals, obtaining an estimate of the STFT magnitude of the source signal. Experiments regarding both speech enhancement and automatic speech recognition are conducted, which demonstrate that the proposed method can effectively suppress reverberation, even for the difficult case of a moving speaker.Comment: Paper submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing. IEEE Signal Processing Letters, 201

    RV Sonne Cruise 200, 11 Jan-11 Mar 2009. Jakarta - Jakarta

    Get PDF
    All plate boundaries are divided into segments - pieces of fault that are distinct from oneanother, either separated by gaps or with different orientations. The maximum size of anearthquake on a fault system is controlled by the degree to which the propagating rupture cancross the boundaries between such segments. A large earthquake may rupture a whole segmentof plate boundary, but a great earthquake usually ruptures more than one segment at once.The December 26th 2004 MW 9.3 earthquake and the March 28th 2005 MW 8.7 earthquakeruptured, respectively, 1200–1300 km and 300–400 km of the subduction boundary betweenthe Indian-Australian plate and the Burman and Sumatra blocks. Rupture in the 2004 eventstarted at the southern end of the fault segment, and propagated northwards. The observationthat the slip did not propagate significantly southwards in December 2004, even though themagnitude of slip was high at the southern end of the rupture strongly suggests a barrier at thatplace. Maximum slip in the March 2005 earthquake occurred within ~100 km of the barrierbetween the 2004 and 2005 ruptures, confirming both the physical importance of the barrier,and the loading of the March 2005 rupture zone by the December 2004 earthquake.The Sumatran Segmentation Project, funded by the Natural Environment Research Council(NERC), aims to characterise the boundaries between these great earthquakes (in terms of bothsubduction zone structure at scales of 101-104 m and rock physical properties), record seismicactivity, improve and link earthquake slip distribution to the structure of the subduction zoneand to determine the sedimentological record of great earthquakes (both recent and historic)along this part of the margin. The Project is focussed on the areas around two earthquakesegment boundaries: Segment Boundary 1 (SB1) between the 2004 and 2005 ruptures atSimeulue Island, and SB2 between the 2005 and smaller 1935 ruptures between Nias and theBatu Islands.Cruise SO200 is the third of three cruises which will provide a combined geophysical andgeological dataset in the source regions of the 2004 and 2005 subduction zone earthquakes.SO200 was divided into two Legs. Leg 1 (SO200-1), Jakarta to Jakarta between January 22ndand February 22nd, was composed of three main operations: longterm deployment OBSretrieval, TOBI sidescan sonar survey and coring. Leg 2 (SO200-2), Jakarta to Jakarta betweenFebruary 23rd and March 11th, was composed of two main operations: Multichannel seismicreflection (MCS) profiles and heatflow probe transects
    corecore