588 research outputs found

    Cepstral analysis based on the Glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise

    Get PDF
    In this paper we introduce a new cepstral coefficient extraction method based on an intelligibility measure for speech in noise, the Glimpse Proportion measure. This new method aims to increase the intelligibility of speech in noise by modifying the clean speech, and has applications in scenarios such as public announcement and car navigation systems. We first explain how the Glimpse Proportion measure operates and further show how we approximated it to integrate it into an existing spectral envelope parameter extraction method commonly used in the HMM-based speech synthesis framework. We then demonstrate how this new method changes the modelled spectrum according to the characteristics of the noise and show results for a listening test with vocoded and HMM-based synthetic speech. The test indicates that the proposed method can significantly improve intelligibility of synthetic speech in speech shaped noise. Index Terms — cepstral coefficient extraction, objective measure for speech intelligibility, Lombard speech, HMM-based speech synthesis 1

    Multi-Cell Random Beamforming: Achievable Rate and Degrees of Freedom Region

    Full text link
    Random beamforming (RBF) is a practically favourable transmission scheme for multiuser multi-antenna downlink systems since it requires only partial channel state information (CSI) at the transmitter. Under the conventional single-cell setup, RBF is known to achieve the optimal sum-capacity scaling law as the number of users goes to infinity, thanks to the multiuser diversity enabled transmission scheduling that virtually eliminates the intra-cell interference. In this paper, we extend the study of RBF to a more practical multi-cell downlink system with single-antenna receivers subject to the additional inter-cell interference (ICI). First, we consider the case of finite signal-to-noise ratio (SNR) at each receiver. We derive a closed-form expression of the achievable sum-rate with the multi-cell RBF, based upon which we show an asymptotic sum-rate scaling law as the number of users goes to infinity. Next, we consider the high-SNR regime and for tractable analysis assume that the number of users in each cell scales in a certain order with the per-cell SNR. Under this setup, we characterize the achievable degrees of freedom (DoF) for the single-cell case with RBF. Then we extend the analysis to the multi-cell RBF case by characterizing the DoF region. It is shown that the DoF region characterization provides useful guideline on how to design a cooperative multi-cell RBF system to achieve optimal throughput tradeoffs among different cells. Furthermore, our results reveal that the multi-cell RBF scheme achieves the "interference-free DoF" region upper bound for the multi-cell system, provided that the per-cell number of users has a sufficiently large scaling order with the SNR. Our result thus confirms the optimality of multi-cell RBF in this regime even without the complete CSI at the transmitter, as compared to other full-CSI requiring transmission schemes such as interference alignment.Comment: 28 pages, 6 figures, to appear in IEEE Transactions of Signal Processing. This work was presented in part at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March 25-30, 2012. The authors are with the Department of Electrical and Computer Engineering, National University of Singapore (emails: {hieudn, elezhang, elehht}@nus.edu.sg

    Sparse Distributed Learning Based on Diffusion Adaptation

    Full text link
    This article proposes diffusion LMS strategies for distributed estimation over adaptive networks that are able to exploit sparsity in the underlying system model. The approach relies on convex regularization, common in compressive sensing, to enhance the detection of sparsity via a diffusive process over the network. The resulting algorithms endow networks with learning abilities and allow them to learn the sparse structure from the incoming data in real-time, and also to track variations in the sparsity of the model. We provide convergence and mean-square performance analysis of the proposed method and show under what conditions it outperforms the unregularized diffusion version. We also show how to adaptively select the regularization parameter. Simulation results illustrate the advantage of the proposed filters for sparse data recovery.Comment: to appear in IEEE Trans. on Signal Processing, 201

    Speech Synthesis Based on Hidden Markov Models

    Get PDF

    Proximal Multitask Learning over Networks with Sparsity-inducing Coregularization

    Full text link
    In this work, we consider multitask learning problems where clusters of nodes are interested in estimating their own parameter vector. Cooperation among clusters is beneficial when the optimal models of adjacent clusters have a good number of similar entries. We propose a fully distributed algorithm for solving this problem. The approach relies on minimizing a global mean-square error criterion regularized by non-differentiable terms to promote cooperation among neighboring clusters. A general diffusion forward-backward splitting strategy is introduced. Then, it is specialized to the case of sparsity promoting regularizers. A closed-form expression for the proximal operator of a weighted sum of 1\ell_1-norms is derived to achieve higher efficiency. We also provide conditions on the step-sizes that ensure convergence of the algorithm in the mean and mean-square error sense. Simulations are conducted to illustrate the effectiveness of the strategy

    Noise cancellation over spatial regions using adaptive wave domain processing

    Get PDF
    This paper proposes wave-domain adaptive processing for noise cancellation within a large spatial region. We use fundamental solutions of the Helmholtz wave-equation as basis functions to express the noise field over a spatial region and show the wave-domain processing directly on the decomposition coefficients to control the entire region. A feedback control system is implemented, where only a single microphone array is placed at the boundary of the control region to measure the residual signals, and a loudspeaker array is used to generate the anti-noise signals. We develop the adaptive wave-domain filtered-x least mean square algorithm. Simulation results show that using the proposed method the noise over the entire control region can be significantly reduced with fast convergence in both free-field and reverberant environmentsThanks to Australian Research Councils Discovery Projects funding scheme (project no. DP140103412). The work of J. Zhang was sponsored by the China Scholarship Council with the Australian National University
    corecore