74 research outputs found

    Post-nonlinear speech mixture identification using single-source temporal zones & curve clustering

    Get PDF
    International audienceIn this paper, we propose a method for estimating the nonlinearities which hold in post-nonlinear source separation. In particular and contrary to the state-of-art methods, our proposed approach uses a weak joint-sparsity sources assumption: we look for tiny temporal zones where only one source is active. This method is well suited to non-stationary signals such as speech. The main novelty of our work consists of using nonlinear single-source confidence measures and curve clustering. Such an approach may be seen as an extension of linear instantaneous sparse component analysis to post-nonlinear mixtures. The performance of the approach is illustrated with some tests showing that the nonlinear functions are estimated accurately, with mean square errors around 4e-5 when the sources are " strongly" mixed

    Nonlinear blind mixture identification using local source sparsity and functional data clustering

    Get PDF
    International audienceIn this paper we propose several methods, using the same structure but with different criteria, for estimating the nonlinearities in nonlinear source separation. In particular and contrary to the state-of-art methods, our proposed approach uses a weak joint-sparsity sources assumption: we look for tiny temporal zones where only one source is active. This method is well suited to non-stationary signals such as speech. We extend our previous work to a more general class of nonlinear mixtures, proposing several nonlinear single-source confidence measures and several functional clustering techniques. Such approaches may be seen as extensions of linear instantaneous sparse component analysis to nonlinear mixtures. Experiments demonstrate the effectiveness and relevancy of this approach

    A Spectral Conversion Approach to the Iterative Wiener Filter for Speech Enhancement

    Get PDF
    The Iterative Wiener Filter (IWF) for speech enhancement in additive noise is an effective and simple algorithm to implement. One of its main disadvantages is the lack of proper criteria for convergence, which has been shown to introduce severe degradation to the estimated clean signal. Here, an improvement of the IWF algorithm is proposed, when additional information is available for the signal to be enhanced. If a small amount of clean speech data is available, spectral conversion techniques can be applied for esimating the clean short-term spectral envelope of the speech signal from the noisy signal, with significant noise reduction. Our results show an average improvement compared to the original IWF that can reach 2 dB in the segmental output Signal-to-Noise Ratio (SNR), in low input SNR\u27s, which is perceptually significant

    Non-Parallel Training for Voice Conversion by Maximum Likelihood Constrained Adaptation

    Get PDF
    The objective of voice conversion methods is to modify the speech characteristics of a particular speaker in such manner, as to sound like speech by a different target speaker. Current voice conversion algorithms are based on deriving a conversion function by estimating its parameters through a corpus that contains the same utterances spoken by both speakers. Such a corpus, usually referred to as a parallel corpus, has the disadvantage that many times it is difficult or even impossible to collect. Here, we propose a voice conversion method that does not require a parallel corpus for training, i.e. the spoken utterances by the two speakers need not be the same, by employing speaker adaptation techniques to adapt to a particular pair of source and target speakers, the derived conversion parameters from a different pair of speakers. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases, and with performance comparable with the ideal case when a parallel corpus is available

    Real-time multiple sound source localization using a circular microphone array based on single-source confidence measures

    Get PDF
    International audienceWe propose a novel real-time adaptative localization approach for multiple sources using a circular array, in order to suppress the localization ambiguities faced with linear arrays, and assuming a weak sound source sparsity which is derived from blind source separation methods. Our proposed method performs very well both in simulations and in real conditions at 50% real-time

    Nonparallel Training for Voice Conversion Based on a Parameter Adaptation Approach

    Get PDF
    The objective of voice conversion algorithms is to modify the speech by a particular source speaker so that it sounds as if spoken by a different target speaker. Current conversion algorithms employ a training procedure, during which the same utterances spoken by both the source and target speakers are needed for deriving the desired conversion parameters. Such a (parallel) corpus, is often difficult or impossible to collect. Here, we propose an algorithm that relaxes this constraint, i.e., the training corpus does not necessarily contain the same utterances from both speakers. The proposed algorithm is based on speaker adaptation techniques, adapting the conversion parameters derived for a particular pair of speakers to a different pair, for which only a nonparallel corpus is available. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30%. A speaker identification measure is also employed that more insightfully portrays the importance of adaptation, while listening tests confirm the success of our method. Both the objective and subjective tests employed, demonstrate that the proposed algorithm achieves comparable results with the ideal case when a parallel corpus is available
    • …
    corecore