26 research outputs found

    Single channel speech separation with a frame-based pitch range estimation method in modulation frequency

    Get PDF
    Computational Auditory Scene Analysis (CASA) has attracted a lot of interest in segregating speech from monaural mixtures. In this paper, we propose a new method for single channel speech separation with frame-based pitch range estimation in modulation frequency domain. This range is estimated in each frame of modulation spectrum of speech by analyzing onsets and offsets. In the proposed method, target speaker is separated from interfering speaker by filtering the mixture signal with a mask extracted from the modulation spectrogram of mixture signal. Systematic evaluation shows an acceptable level of separation comparing with classic methods

    Parallel and Limited Data Voice Conversion Using Stochastic Variational Deep Kernel Learning

    Full text link
    Typically, voice conversion is regarded as an engineering problem with limited training data. The reliance on massive amounts of data hinders the practical applicability of deep learning approaches, which have been extensively researched in recent years. On the other hand, statistical methods are effective with limited data but have difficulties in modelling complex mapping functions. This paper proposes a voice conversion method that works with limited data and is based on stochastic variational deep kernel learning (SVDKL). At the same time, SVDKL enables the use of deep neural networks' expressive capability as well as the high flexibility of the Gaussian process as a Bayesian and non-parametric method. When the conventional kernel is combined with the deep neural network, it is possible to estimate non-smooth and more complex functions. Furthermore, the model's sparse variational Gaussian process solves the scalability problem and, unlike the exact Gaussian process, allows for the learning of a global mapping function for the entire acoustic space. One of the most important aspects of the proposed scheme is that the model parameters are trained using marginal likelihood optimization, which considers both data fitting and model complexity. Considering the complexity of the model reduces the amount of training data by increasing the resistance to overfitting. To evaluate the proposed scheme, we examined the model's performance with approximately 80 seconds of training data. The results indicated that our method obtained a higher mean opinion score, smaller spectral distortion, and better preference tests than the compared methods

    Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain

    Get PDF
    Auditory scene in a natural environment contains multiple sources. Auditory scene analysis (ASA) is the process in which the auditory system segregates a scene into streams corresponding to different sources. The determination of range of pitch frequency is necessary for segmentation. We propose a system to determine the range of pitch frequency by analyzing onsets and offsets in modulation frequency domain. In the proposed system, first the modulation spectrum of speech is calculated and then, in each subband onsets and offsets will be detected. Thereafter, the segments are generated by matching corresponding onset and offset front. Finally, by choosing the desired segments, the rage of pitch frequency is determined. Systematic evaluation shows that the range of pitch frequency is estimated with good accuracy

    A Subband Beamformer On An Ultra Low-Power Miniature Dsp Platform

    No full text
    This paper presents the design and implementation of a subband cardioid beamformer on an ultra low-power miniature DSP platform, using a 2-microphone endfire array. The subband beamformer extends the classical time-domain, narrow-band algorithm to a frequency-domain, broadband implementation, so it is suitable for general speech and audio applications. An oversampled, weighted overlap-add filterbank is used to allow wide gain and phase adjustments for low power, low group delay requirements. A subband IIR filter is proposed to overcome the non-zero bandwidth of the frequency bands, and to introduce a nearly linear phase adjustment across the bands. The subband implementation allows the flexibility of integrating the beamformer with additional algorithms at different frequency ranges. The beamformer has been implemented in real-time on Dspfactory's Toccata platform, which has been specifically designed for ultra low-power, miniature, head-mounted audio devices. At 1.25 Volts with a 5 MIPS DSP core, the Toccata consumes only about 800 micro Watts without microphones and receivers

    Complexity Reduction And Regularization Of A Fast Affine Projection Algorithm For Oversampled Subband Adaptive Filters

    No full text
    The Affine Projection Algorithm (APA) has been shown to improve the performance of Over-Sampled Subband Adaptive Filters (OS-SAFs) compared to classical Normalized Least Mean Square (NLMS) algorithms. Because of the complexity of APA, however, only low-order APAs are practical for real-time implementation. Thus, in this paper, we propose a reduced-complexity version of the Gauss-Seidel Fast APA (GSFAPA) for adapting the subband filters in OS-SAF systems. We propose modifying the GSFAPA with a complexity reduction method based on partial filter update, and also with a low-cost method for combined regularization and step size control. We show the advantage of the new algorithm -- termed Low-Cost Gauss-Seidel Fast Affine Projection -- compared to the APA in a subband echo canceller application
    corecore