365 research outputs found
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
Dynamic Experiment Design Regularization Approach to Adaptive Imaging with Array Radar/SAR Sensor Systems
We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the “model-free” variational analysis (VA)-based image enhancement approach and the “model-based” descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations
Application of sound source separation methods to advanced spatial audio systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development
and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by
means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel
stereo format, special up-converters are required to use advanced spatial audio reproduction formats,
such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to
accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is
required.
Source separation problems in digital signal processing are those in which several signals have been mixed
together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied
to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately,
most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This
condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to
the sparsity of the sources under some signal transformation.
This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result,
its contributions can be categorized within these two areas. First, two underdetermined SSS methods are
proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a
multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of
sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the
features considered by each of them are related to different localization cues that enable to perform separation
of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at
improving the isolation of the separated sources are proposed. The performance achieved by
several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of
listening tests, paying special attention to the change observed in the perceived spatial attributes.
Although the estimated sources are distorted versions of the original ones, the masking effects
involved in their spatial remixing make artifacts less perceptible, which improves the overall
assessed quality. Finally, some novel developments related to the application of time-frequency
processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci
Polarimetric Synthetic Aperture Radar
This open access book focuses on the practical application of electromagnetic polarimetry principles in Earth remote sensing with an educational purpose. In the last decade, the operations from fully polarimetric synthetic aperture radar such as the Japanese ALOS/PalSAR, the Canadian Radarsat-2 and the German TerraSAR-X and their easy data access for scientific use have developed further the research and data applications at L,C and X band. As a consequence, the wider distribution of polarimetric data sets across the remote sensing community boosted activity and development in polarimetric SAR applications, also in view of future missions. Numerous experiments with real data from spaceborne platforms are shown, with the aim of giving an up-to-date and complete treatment of the unique benefits of fully polarimetric synthetic aperture radar data in five different domains: forest, agriculture, cryosphere, urban and oceans
Binaural Source Separation with Convolutional Neural Networks
This work is a study on source separation techniques for binaural music mixtures. The chosen framework uses a Convolutional Neural Network (CNN) to estimate time-frequency soft masks. This masks are used to extract the different sources from the original two-channel mixture signal. Its baseline single-channel architecture performed state-of-the-art results on monaural music mixtures under low-latency conditions. It has been extended to perform separation in two-channel signals, being the first two-channel CNN joint estimation architecture. This means that filters are learned for each source by taking in account both channels information. Furthermore, a specific binaural condition is included during training stage. It uses Interaural Level Difference (ILD) information to improve spatial images of extracted sources. Concurrently, we present a novel tool to create binaural scenes for testing purposes. Multiple binaural scenes are rendered from a music dataset of four instruments (voice, drums, bass and others). The CNN framework have been tested for these binaural scenes and compared with monaural and stereo results. The system showed a great amount of adaptability and good separation results in all the scenarios. These results are used to evaluate spatial information impact on separation performance
- …